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Proposal  for  Research  in 
Quantitative  Bioassay  Methodology, 
and  Risk  Analysis  and  Characterization 

I.  INTRODUCTION  AND  BACKGROUND 

The  objectives  of  the  above  project  were  formulated  in  discussion  with  Mr.  Henry 
Gardner  of  U.S.  Army  Medical  R&D  Command,  Ft.  Detrick,  Maryland.  The  project 
purpose  and  workscope  was  stated  in  the  proposal  as  follows:  to  perform  mathematical, 
statistical  and  risk-analytical  work  in  support  of  the  mission  of  the  Army  Biomedical 
Research  and  Development  Laboratory  (ABRDL). 

II.  APPROACHES  TAKEN  AND  PROGRESS 

We  have  analyzed  data  obtained  from  other  researchers  supported  by  ABRDL.  We 
have  also  developed  preliminary  mathematical  models  to  summarize  experimental 
findings  by  ABRDL-supported  researchers.  Brief  descriptions  of  our  work  are  given 
below. 

A.  Analysis  of  Data  from  the  Pilot  Study  in  Medaka  Conducted  by  Gulf  Coast 
Research  Laboratory 

Dr.  Marilyn  G.  Wolfe  of  Experimental  Pathology  Laboratories,  Inc.  sent  us  summary 
incidence  tables  for  selected  liver  neoplasms  from  a  pilot  study  in  medaka  conducted  by 
Gulf  Coast  Research  Laboratory.  The  study  was  entitled  "Dose  response  relationships 
for  hepatocarcinogenesis  in  medaka  (Oryzias  latipes)  exposed  to  waterborne 
N-Nitrosodiethylamine  [DEN]." 

The  data  analysis  reported  in  Appendix  1  uses  the  summary  table  of  incidences  of 
hepatocellular  neoplasms  (adenoma[s]  and/ or  carcinoma[s])  combined.  There  are  5 
treatment  groups  in  the  study:  a  control,  and  those  with  concentration  levels  of 
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2.5  mg/L,  5.0  mg/L,  10.0  mg/L,  and  20.0  mg/L.  Each  treatment  group  is  assigned  4 
tanks  of  medaka.  There  is  one  experiment  which  uses  medaka  that  are  6  days  of  age  at 
the  start  of  the  experiment  and  another  experiment  which  uses  medaka  that  are  52  days 
of  age  at  the  start  of  the  experiment.  A  number  of  fish  from  each  tank  are  sacrificed  at  4, 
6,  and  9  months.  The  livers  are  examined  and  the  number  of  fish  exhibiting 
hepatocellular  neoplasms  (adenomas[s]  and/ or  carcinoma[s])  is  recorded. 

Graphical  displays  of  the  data  suggest  that  the  log  odds  of  at  least  one  neoplasm 
occurring  is  roughly  linearly  increasing  with  concentration  and  time  of  sacrifice.  The 
displays  also  suggest  that  the  age  of  the  medaka  at  the  start  of  the  experiment  has  an 
effect.  The  medaka  that  are  of  age  6  days  at  the  start  of  the  experiment  tend  to  have 
higher  incidence  of  at  least  one  neoplasm  than  those  of  age  52  days  at  the  beginning  of 
the  experiment.  This  suggests  that  young  fish  have  greater  concentration-response 
sensitivity  than  do  old  fish.  It  may  recommend  the  use  of  young  fish  as  toxin  biological 
detectors. 

Data  with  covariates  and  binary  responses  are  often  usefully  described  using  a 
logistic  regression  model  cf.  Collett  (1991).  If  . . .,  Xj^n  denote  the  values  of  covariates 

(e.g.  sacrifice  times,  concentration)  for  the  i*  animal,  then 

p|z^^  animal  has  at  least  one  neoplasm|j:j  |,...,Xf  „| 

=  [l  +  exp{^X/l  +  p2Xi2  + . . .  +  Pn^in  }]  • 

Appendix  1  describes  the  results  of  using  the  logistic  regression  model  to  explore 
incidences  of  hepatocellular  neoplasms  as  a  function  of  concentration  and  sacrifice  time. 

A  statistical  model  can  not  only  be  used  to  describe  data  but  also  be  used  to  predict 
data.  Appendix  1  also  presents  results  which  explore  the  ability  of  certain  logistic 
regression  models  estimated  with  one  part  of  the  data  to  predict  the  other  part.  Of 
particular  interest  are  models  estimated  with  higher  concentration  data  and  used  to 
predict  lower  concentration  results;  and  models  estimated  with  data  from  small  sacrifice 
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times  and  used  to  predict  data  from  larger  sacrifice  times.  The  results  suggest  that  the 
models  have  some  ability  to  predict.  However,  additional  data  at  small  sacrifice  times 
and  smaller  concentrations  are  needed  to  improve  the  accuracy  and  usefulness  of  the 
statistical  models.  Careful  choice  of  concentration  levels  can  be  expected  to  improve  the 
quality  of  the  low-concentration  response  predictions. 

B.  Analysis  of  Data  From  a  Cell  Proliferation  Study  Using  Medaka  (Oryzias  latipes) 

Appendices  2-4  report  data  analyses  of  an  experiment  using  Japanese  medaka  to 
study  cell  proliferation  imder  different  concentrations  of  DEN  and  TCE.  Appendices  2- 
3  report  analyses  of  summary  data. 

The  medaka  are  exposed  to  differing  levels  of  DEN  and  TCE.  Each  treatment  group 
has  two  replicate  tanks.  Eight  animals  in  each  tank  were  sacrificed  on  4  August  1993; 
this  is  sacrifice  B.  Eight  additional  animals  in  each  tank  were  sacrificed  on  20  August 
1993;  this  is  sacrifice  D.  Each  sacrificed  fish  was  exposed  to  BrdU  for  72  hours  prior  to 
sacrifice;  any  cell  that  is  in  S-phase  during  this  time  has  a  BrdU  marker.  Each  sacrificed 
fish  is  frozen  and  sliced  longitudinally  into  7-micron  sections. 

Five  slices  containing  a  portion  of  liver  are  considered  for  each  fish.  An  agent  was 
used  that  stains  the  nuclei  with  the  BrdU  marker  black;  these  nuclei  are  called  positive.  A 
region  of  interest  (ROI)  is  marked  on  the  slice;  the  ROI  is  chosen  so  as  to  maximize  the 
number  of  hepatocytes  and  minimize  the  number  of  nonhepatocytes  present. 

For  half  the  fish  in  sacrifice  B  two  measures  of  cell  proliferation  were  estimated;  a 
count  index  (Cl)  and  an  area  index  (AI).  The  count  index  for  a  slice  is  the  number  of 
positive  hepatocytes  in  the  ROI  divided  by  the  number  of  hepatocytes  in  the  ROI, 
multiplied  by  100.  The  evaluation  of  the  latter  measure  is  very  labor-intensive.  An 
alternative  is  an  area  index  (AI)  which  is  (the  area  of  positive  nuclei  in  the  ROI  divided 
by  the  area  of  the  hepatocytes  in  the  ROI)  times  100.  The  AI  is  easier  to  obtain;  however, 
it  does  not  quantify  cells  in  S-phase  exactly  as  does  the  Cl  since  cells  can  be  of  different 
size.  Graphical  analysis  indicates  that  there  appears  to  be  a  satisfactory  linear 


relationship  between  count  index  and  area  index,  indicating  that  AI  and  Cl  are 
generally  measuring  the  same  response.  However,  the  variability  of  the  area  index 
increases  as  the  count  index  increases;  this  increase  is  generally  associated  with  high 
DEN  and  TCE  concentration  levels.  This  association  suggests  that  cell  sizes  become 
more  variable  under  concentration  of  DEN  and  TCE. 

Data  from  the  experiment  arrived  in  three  phases.  Initially  only  summary  data  was 
available.  Appendices  2-3  describe  results  of  data  analysis  using  summary  measures  of 
area  indices  of  3-5  slices  for  each  fish.  Appendix  4  describes  results  of  data  analysis 
using  the  area  indices  of  the  slices  for  each  fish.  The  results  of  the  data  analysis  can  be 
briefly,  and  simplistically,  summarized  as  follows.  While  it  can  be  said  that  there  is  a 
statistically  significant  difference  between  mean  responses  of  (the  square  root  of)  the 
area  index  to  the  various  treatments  with  DEN  and  TCE,  no  simple  and  interpretable 
dose-response  patterns  have  been  found.  In  particular,  response  does  not  appear  to 
increase  (or  decrease)  systematically  with  dose  increase,  where  "dose"  includes  time  of 
exposure  as  well  as  increases  in  chemical  concentration  levels.  It  remains  to  be  seen 
whether  the  latter  inconclusivity  is  lessened  by  the  analysis  of  more  data  (later 
sacrifices),  by  finding  that  experimental  problems  or  biases  occurred,  or,  more  exciting, 
that  the  dose-responses  observed  can  be  explained  by  biological  mechanism,  and  that 
the  findings  essentially  reappear  when  further  experiments  and  data  analyses  are 
conducted. 

C.  Analysis  of  Female  Breast  Tissue  Data  in  Order  to  Predict  Cancer 

In  October  1994,  Dr.  D.  Malins  sent  us  data  dated  10/25/94  from  a  study  of  female 
breast  tissues.  The  data  are  measurements  from  breast  tissue  samples  from  30  female 
patients.  Fifteen  of  the  patients  underwent  reduction  mammoplasty;  tissues  from  these 
patients  are  considered  to  be  normal.  The  other  15  patients  had  invasive  ductal 
carcinoma.  The  study  used  multiple  breast  tissue  samples  from  some  patients. 
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The  data  considered  are  measurements  of  fapyadenine  (Fapy-A),  8-hydroxyadenine 
(8-OH-A),  fapyguanine  (Fapy-G),  and  8-hydroxyguanine  (8-OH-G)  from  breast  tissue 
samples. 

Appendix  5  reports  the  results  of  an  analysis  of  data  consisting  of  68  samples  from 
women  who  underwent  reduction  mammoplasty  and  10  samples  from  invasive  ductal 
tumors  for  a  total  of  78  samples.  Of  interest  was  the  ability  of  the  covariates  Fapy-A, 
8-OFl-A,  Fapy-G,  and  8-OH-G  to  predict  the  occurrence  of  cancerous /normal  tissue. 
Logistic  regression  models  were  used  in  the  study. 

One  way  to  evaluate  the  usefulness  of  a  statistical  model  is  to  evaluate  how  well  it 
describes  the  data  used  to  estimate  the  parameters  of  the  model  (goodness-of-fit). 
Another  way  is  to  evaluate  how  well  the  statistical  model  predicts  new  data  that  was 
not  used  to  estimate  its  parameters.  This  latter  process  is  called  cross  validation;  it  is  a 
natural  and  well-accepted  procedure  for  assessing  the  quality  of  a  proposed  prediction 
methodology.  Mosteller  and  Tukey  (1977)  give  a  good  discussion. 

Simulation  is  used  for  cross  validation.  In  each  simxilation  replication,  the  data  are 
randomly  allocated  to  one  of  two  data  sets.  One  data  set  is  used  to  estimate  the 
parameters  of  a  logistic  regression  model.  The  estimated  model  is  then  used  to  predict 
the  probability  that  a  data  point  from  the  other  data  set  is  from  a  cancerous  tissue. 

A  summary  of  the  results  is  as  follows.  Of  the  logistic  regression  models  considered, 
the  logistic  regression  model  with  the  best  goodness-of-fit  is,  not  surprisingly,  the  one 
with  the  greatest  number  of  covariates:  constant,  log  (Fapy-A),  log  (8-OH-A), 
log  (Fapy-G),  and  log  (8-OH-G).  However,  this  model  tends  to  be  the  weakest  predictor; 
this  model  can  be  viewed  as  overfitting  the  data.  There  are  two  logistic  regression 
models  that  are  better  predictors:  regression  Model  I  has  these  covariates:  constant, 
log  (Fapy-A)  and  log  (8-OH-A);  the  other  logistic  regression  model  (II)  has  these 
covariates:  constant  and  log  (Fapy-A/ 8-OH-A).  Regression  Model  11  tends  to  predict  the 
occurrence  of  normal  samples  somewhat  better  than  Model  I  does.  However,  Model  I 
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predicts  cancer  samples  somewhat  better  than  Model  11.  What  this  means  is  that  Model  I 
tends  to  give  more  false  positives  than  Model  II;  whereas  Model  n  tends  to  give  more 
false  negatives  than  Model  1.  All  the  logistic  regression  models  considered  had  more 
difficulty  in  predicting  the  cancer  samples  than  they  did  predicting  the  normal  samples: 
on  the  average  about  1  out  of  5  cancer  samples  was  incorrectly  predicted. 

Do  Analysis  of  Data  Sent  by  Dn  L.  Twerdok  of  GEO-CENTERS,  INCo  at  UoS.  Army 
BRDL  in  October  1994 

The  data  consist  of  measurements  made  on  medaka  that  were  sacrificed  at  different 
times.  The  information  recorded  for  each  fish  include:  the  data  of  sacrifice;  the  age;  the 
length  in  millimeters;  the  weight  in  milligrams;  percent  hematocrit;  percent  leukocrit; 
and  the  hatch  date.  These  data  are  from  the  beginning  of  a  study  monitoring  the  health 
status  of  medaka  used  in  toxicology  studies. 

Appendix  6  reports  results  of  the  data  analysis.  The  analysis  suggests  that  there  are 
associations  between  the  leukocrit  measurement  and  the  sacrifice  date.  This  association 
may  be  due  to  the  procedures  used  to  measure  leukocrit;  it  may  also  be  due  to 
differences  in  water  quality  and  other  physical  factors  in  the  experiment.  The  presence 
of  a  "sacrifice  effecf'  may  detrimentally  affect  the  ability  to  measure  effects  of 
treatments  across  sacrifices.  Sacrifice  effects  will  tend  to  dilute  the  strength  of 
associations  between  measured  variables  and  treatments. 

E.  Mathematical  Models 

Appendix  7  presents  preliminary  simple  mathematical  models  to  describe,  in  a 
quantitative  way,  the  behavior  of  data  exhibited  by  Dr.  Judith  Zelikoff  at  the  U.S. 

ABRDL  Research  Project  Review  held  on  September  20-21, 1994.  The  models  describe 
the  production  of  free  radical  O  ^  that  is  stimulated  by  the  introduction  of  an  initial 

concentration  of  PMA  at  time  zero.  The  production  is  made  visible  or  observable  by 
surrounding  the  cells  exposed  to  PMA  with  a  solution  of  luminal.  These  models  are  a 
first  step  to  providing  a  quantitative  concentration-response  relationship.  The 
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mathematical  models  could  also  be  used  to  simulate  experiments  before  they  are  done. 
Such  simulations  could  improve  the  design  and  cost-effectiveness  of  the  experiments. 

Appendix  8  presents  a  preliminary  model  for  the  occurrence  and  repair  of  cell 
adducts  caused  by  a  dose  stimulus.  This  model  was  motivated  by  discussions  with  Dr. 

J.  G.  Burkhart  of  the  Environmental  Toxicology  program.  National  Institute  of 
Environmental  Health  Science.  The  model  is  also  relevant  to  the  analysis  of  female 
breast  data  from  Dr.  D.  Malins. 

We  propose  to  spend  more  effort  on  such  model-construction  and  exploration  in 
future.  Incorporation  of  biological  mechanism  into  dose-response  relationships  is 
expected  to  provide  better  predictions  of  toxic  dose  effect  than  does  the  use  of  simple 
standard  statistical  methods. 

III.  CONCLUSIONS 

During  the  past  12  months  we  have  participated  in  the  analysis  of  data  obtained 
from  other  investigators  supported  by  ABRDL  and  initiated  the  development  of 
mathematical  models  in  collaboration  with  some  investigators  to  summarize  and 
predict  data  in  a  more  meaningful  way. 

The  analysis  of  experimental  data  has  focused  attention  on  the  importance  of 
controlling  experimental  conditions  so  as  to  minimize  unwanted  sources  of  variability 
such  as  tank  effects  and  time  of  sacrifice  effects.  Unless  these  sources  of  variability  are 
controlled  or  adjusted  for,  they  will  tend  to  dilute  the  strength  of  inferred  associations 
between  measured  variables  and  treatments.  The  study  initiated  by  Dr.  Twerdok  should 
be  helpful  in  controlling  these  unwanted  sources  of  variability.  The  design  of 
experiments  should  also  be  sensitive  to  these  sources  of  variability. 

The  usefulness  of  a  statistical  model  can  be  evaluated  in  two  ways.  One  is  to 
evaluate  how  well  it  describes  the  data  used  to  estimate  the  parameters  of  the  model 
(goodness-of-fit).  Another  way  is  to  evaluate  how  well  the  statistical  model  predicts 
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new  data  that  is  not  used  to  estimate  its  parameters  (cross  validation).  It  is  important  to 
incorporate  this  latter  evaluation  in  the  analysis  of  data. 

Mathematical  and  statistical  models  are  important  not  only  because  they  summarize 
current  data  but  also  because  they  can  be  used  to  predict.  For  example,  they  can  assist  in 
planning  future  experiments.  A  model  can  assist  in  determining  the  number  of  animals 
needed  and  the  number  of  sacrifice  times  and  when  they  should  occur  to  estimate  a 
dose  response  relationship. 

We  propose  to  continue  to  perform  mathematical,  statistical  and  risk-analytical  work 
in  support  of  ABRDL. 
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APPENDIX  1 


An  Exploratory  Analysis  of  Data  from  a  Mega-Medaka  Study 

by 

D.  P.  Gaver 
P,  A.  Jacobs 

1.  Introduction;  Experimental  Situation 

This  paper  reports  the  results  of  an  exploratory  data  analysis  of  the  combined 
incidences  of  hepatocellular  neoplasms  and  carcinomas  from  the  summary 
incidence  tables  for  selected  liver  neoplasms  from  the  pilot  study  in  medaka 
conducted  by  Gulf  Coast  Research  Laboratory.  The  study  was  entitled  "Dose 
response  relationships  for  hepatocardnogenesis  in  medaka  (Oryzias  latipes) 
exposed  to  waterborne  N-Nitrosodiethylamine  [DEN]."  The  data  used  for  the 
data  analysis  reported  here  appear  as  Appendix  C. 

There  are  5  treatment  groups  in  the  study;  a  control  and  those  with 
concentration  levels  of  2.5  mg/L,  5.0  mg/L,  10.0  mg/L  and  20.0  mg/L.  Each 
treatment  group  is  assigned  4  tanks  of  medaka.  There  is  one  experiment  which 
uses  medaka  that  are  6  days  of  age  at  the  start  of  the  experiment  and  another 
experiment  which  uses  medaka  that  are  52  days  of  age  at  the  start  of  the 
experiment.  A  number  of  fish  from  each  tank  are  sacrificed  at  4,  6  and  9  months. 
The  livers  are  examined  for  hepatocellular  neoplasms  (adenomas  and/or 
carcinomas).  The  number  of  fish  exhibiting  neoplasms  is  recorded.  Apparently 
several  pathologists  assessed  fish  condition;  we  analyzed  data  from  one  such 
assessment  (or  a  consensus  thereof). 

Sections  2  and  3  describe  a  graphical  analysis  of  the  data.  Section  4  describes 
the  results  of  using  a  logistic  regression  model  to  estimate  parameters.  One 
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would  hope  that  a  statistical  model  would  not  only  describe  data  well  but  could 
also  be  used  to  predict.  In  Sections  5-7  we  explore  the  predictive  ability  of 
logistic  regression  models  estimated  using  part  of  the  data  to  predict  the  other 
part  of  the  data. 

2,  A  Graphical  Exploratory  Analysis 

Figures  1  and  2  (respectively  3  and  4)  display  data  from  the  experiment  using 
medaka  of  age  6  days  (respectively  52  days)  at  the  start  of  the  experiment. 

Figure  1  (respectively  Figure  3)  displays  3  scatter  plots  for  medaka  of  age  6 
days  (respectively  52  days)  at  the  start  of  the  experiment:  one  for  each  sacrifice 
time.  The  plots  are  of  the  fraction  of  sacrificed  medaka  with  at  least  one 
neoplasm  versus  concentration.  A  comparison  of  the  scales  on  the  two  figures 
suggests  that  there  is  an  association  between  the  age  of  the  medaka  at  the  start  of 
the  experiment  and  the  response  to  concentration  and  time  of  sacrifice.  The 
medaka  of  age  6  at  the  start  of  the  experiment  tend  to  have  a  greater  incidence  of 
neoplasms.  The  association  of  neoplasms  with  concentration  for  the  age  6 
medaka  appears  to  change  with  sacrifice  time;  convex  for  sacrifice  times  4  and  6 
months  and  concave  for  sacrifice  time  9  months.  Larger  fractions  are  associated 
with  larger  concentrations  and  sacrifice  times. 

Figure  2  (respectively  Figure  4)  displays  5  scatter  plots  for  medaka  of  age  6 
days  (respectively  52  days)  at  the  start  of  the  experiment;  one  for  each 
concentration  of  the  fraction  of  medaka;  the  plots  are  of  the  fraction  of  medaka 
with  at  least  one  neoplasm  versus  sacrifice  time.  A  comparison  of  the  figures 
once  again  suggests  an  association  between  age  of  the  medaka  at  the  beginning 
of  the  experiment  and  concentration.  The  association  between  sacrifice  time  and 
fraction  of  medaka  with  the  abnormalities  appears  weaker  for  lower 
concentrations. 
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3.  Logistic  Model 

The  following  logistic  model  is  used  for  the  exploratory  data  analysis. 
Suppose  M  fish  are  sacrificed  at  time  t  and  let  N  be  the  number  of  fish  whose 
livers  have  neoplasms.  We  assume  that  N  has  a  binomial  distribution  with  M 
trials  and  probability  of  at  least  one  neoplasm  occurring,  (possibly)  depending  on 
covariates  x  =  (xi,  X2,  •••,  Xc)  such  as  the  concentration  and  time  of  sacrifice;  that 
is, 

P{N  =  n]  =  (1  n  =  0,1,.. .,M  (3.1) 

with 

=  ^  (3.2) 

where 


^  =  (3.3) 

y 

Note  that  for  this  model  the  log  odds  of  a  neoplasm  not  occurring  is 

d{x)  ^  log 

d{x)  is  called  the  logistic  transform.  Thus,  plots  of  the  data  version  of  the  log  odds 
of  a  neoplasm  not  occurring  versus  potential  covariates  will  give  some  idea  of  the 
functional  relationship  between  the  covariates  and  the  probability  of  no 
neoplasm  occurring. 

The  (empirical)  log  odds  of  a  neoplasm  not  occurring  is 


0j=log 


ni-0i+\ 

Oi+^ 


(3.5) 
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is  computed  for  each  tank,  each  sacrifice  time,  and  each  concentration  where  ni  is 
the  number  of  medaka  sacrificed  and  0/  is  the  number  of  sacrificed  medaka 
having  at  least  one  neoplasm. 

Figures  5-8  display  plots  of  the  O/s  versus  time  of  sacrifice  and  concentration. 
Figures  5-6  display  the  plots  for  medaka  aged  6  days  at  the  beginning  of  the 
experiment.  Figures  7-8  display  plots  for  medaka  aged  52  days  at  the  beginning 
of  the  experiment. 

Figure  5  (respectively  Figure  7)  displays  5  scatter  plots  for  medaka  of  age  6 
days  (respectively  52  days)  at  the  start  of  the  experiment.  The  scatter  plots  are  of 
the  log  odds  of  a  neoplasm  not  occurring  versus  time  of  sacrifice,  one  scatter  plot 
for  each  concentration;  there  is  one  point  for  each  tank.  The  plots  indicate  that  the 
log  odds  of  a  neoplasm  not  occurring  is  roughly  linearly  decreasing  with  time  of 
sacrifice. 

Figure  6  (respectively  Figure  8)  displays  3  scatter  plots  for  medaka  of  age  6 
days  (respectively  52  days)  at  the  start  of  the  experiment.  The  scatter  plots  are  of 
the  log  odds  of  a  neoplasm  not  occurring  versus  the  concentration,  one  scatter 
plot  for  each  sacrifice  time;  there  is  one  point  for  each  tank.  The  plots  indicate 
that  the  log  odds  of  a  neoplasm  not  occurring  is  roughly  linearly  decreasing  with 
concentration. 

4,  Logistic  Analysis  Conditional  on  Age,  and  By  Concentration 

Table  1  presents  results  of  an  analysis  of  the  experiment  in  which  the  fish 
were  of  age  6  days  at  the  start  of  the  experiment.  The  data  for  each  tank  are  used 
to  fit  a  logistic  model  with  the  probability  a  fish  develops  at  least  one  neoplasm 
by  t  time  units  into  the  experiment  of  the  form 
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Table  1  displays  the  estimates  of  jSq  and  Pi  for  each  tank  in  each  treatment  group 
and  the  estimates  of  fio  and  Pi  obtained  by  combining  all  the  tanks  in  each 
treatment  group.  Also  displayed  are  the  standard  errors  of  the  estimates  as 
computed  using  as)nnptotic  Fisher  Information.  This  is  a  standard  biostatistical 
methodology.  Within  each  treatment  group,  the  four  tanks  are  labeled  1  through 

4.  An  alternative,  re-sampling,  or  "the  bootstrap  method",  might  well  provide 
more  reliable  results.  Note  that  as  the  concentration  increases  the  estimates  of  Po 
tend  to  decrease.  Thus,  the  estimated  probability  of  at  least  one  neoplasm 
occurring  at  or  before  a  fixed  time  t  will  tend  to  increase  as  the  concentration 
increases,  as  is  not  surprising.  Note  also  that  the  estimates  of  pi  are  negative. 
Thus,  the  estimated  probability  of  at  least  one  neoplasm  occurring  before  time  t 
will  tend  to  increase  as  t  increases.  The  estimated  values  of  Pi  do  not  appear  to 
depend  on  the  size  of  the  concentration  (other  than  the  control). 

5.  Logistic  Model:  Time  and  Concentration  as  Covariates;  Low- 

Concentration  Prediction 

A  logistic  model  with  covariates  sacrifice  time  and  concentration  is  estimated 
using  all  data  at  concentrations  5.0, 10.0  and  20.0  mg/L;  specifically,  the  model  is 
that  the  probability  a  fish  develops  at  least  one  neoplasm  by  t  time  units  into  the 
experiment  with  concentration  d  has  the  form 

l  +  ePo+Pit+Pid- 

For  medaka  that  are  6  days  of  age  at  the  start  of  the  experiment  the  estimates 
(asymptotic  standard  errors)  [bootstrap  standard  errors]  are 

A)  =  6.43  A  =  -0-65  p2  =  -0.17 

(0.46)  (0.054)  (0.02) 

[0.49]  [0.057]  [0.02] 
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This  estimated  model  can  be  used  to  predict  the  probability  of  a  neoplasm 
occurring  before  a  sacrifice  time  at  the  smaller  concentration  2.5  mg/L  and  the 
control.  These  predicted  probabilities  appear  in  Table  2  along  with  an  estimated 
standard  error.  Also  appearing  is  the  fraction  of  medaka  in  all  the  tanks  at 
concentration  2.5  mg/L  which  have  at  least  one  neoplasm;  also  displayed  are  the 
fractions  of  medaka  in  the  control  having  neoplasms. 

For  medaka  that  are  52  days  of  age  at  the  start  of  the  experiment,  the 
estimates  (standard  errors)  using  all  data  at  concentrations  5.0, 10.0,  20.0  mg/L 
are 

A)  =6.03  ^=-0.45  jS2=-0.12 

(0.48)  (0.05)  (0.02) 

[0.48]  [0.05]  [0.02] 

This  estimated  model  can  be  used  to  predict  probabilities  of  at  least  one 
neoplasm  occurring  by  time  t  at  the  smaller  concentration  2.5  mg/L  and  the 
control.  These  predicted  probabilities  appear  in  Table  3  along  with  an  estimated 
standard  error.  Also  displayed  is  the  actual  fraction  of  medaka  that  were  found 
to  have  at  least  one  neoplasm. 

Both  estimated  models  resulted  in  predicted  probabilities  for  the  control  that 
are  larger  than  the  observed  fraction  of  medaka  having  at  least  one  neoplasm  in 
each  control  group,  particularly  at  the  long  exposure  time.  However,  the 
standard  errors  of  the  predictions  are  also  large.  The  predicted  probabilities  for 
the  2.5  mg/L  groups  are  closer  to  the  observed  fraction  for  the  experiment  using 
medaka  of  age  52  days  at  the  start  of  the  experiment. 

Table  4  (respectively  Table  5)  reports  results  of  predicting  the  fraction  of 
medaka  in  the  control  group  that  have  neoplasms  using  a  model  estimated  with 
data  from  the  experiment  with  concentrations  2.5,  5.0,  10.0  and  20.0  mg/L;  the 
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experiment  is  for  medaka  aged  6  days  (respectively  aged  52  days)  at  the  start  of 
the  experiment.  The  predicted  fraction  for  the  medaka  sacrificed  at  9  months  is 
high. 

6.  Logistic  Model:  Time  and  (Concentration  X  Time)  as  Covariates;  Low- 

Concentration  Prediction 

A  logistic  model  with  covaritates  sacrifice  time  and  concentration  x  (sacrifice 
time)  is  estimated  using  all  data  at  concentrations  5.0,  10.0,  and  20.0  mg/L; 
specifically,  the  model  is  that  the  probability  a  fish  develops  at  least  one 
neoplasm  by  time  t  units  into  the  experiment  with  concentration  d  has  the  form 

For  medaka  that  are  6  days  of  age  at  the  start  of  the  experiment  the  estimates 
(asymptotic  standard  errors) 

pQ  =  4.48  ^  =  -0.35  ^  =  -0.027 

(s.e.)(0.35)  (0.05)  (0.003) 

This  estimated  model  can  be  used  to  predict  the  probability  of  a  neoplasm 
occurring  before  a  sacrifice  time  at  the  smaller  concentration  2.5  mg/L  and  the 
control.  These  predicted  probabilities  appear  in  Table  2a  along  with  the  bootstrap 
standard  error.  Also  appearing  is  the  fraction  of  medaka  in  all  the  tanks  at 
concentration  2.5  mg/L  which  have  at  least  one  neoplasm;  also  displayed  are  the 
fractions  of  medaka  in  the  control  having  neoplasms. 

For  medaka  that  are  52  days  of  age  at  the  start  of  the  experiment,  the 
estimates  (standard  errors)  using  all  data  at  concentrations  5.0, 10.0,  20.0  mg/L 
are 

Po  =  4.35  ^  =  -0.22  i32  =  -0.01 7 

(s.e.)(0.37)  (0.06)  (0.002) 
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This  estimated  model  can  be  used  to  predict  probabilities  of  at  least  one 
neoplasm  occurring  by  time  t  at  the  smaller  concentration  2.5  mg/L  and  the 
control.  These  predicted  probabilities  appear  in  Table  3a  along  with  an  estimated 
standard  error.  Also  displayed  is  the  actual  fraction  of  medaka  that  were  found 
to  have  at  least  one  neoplasm. 

The  predictions  with  the  model  (6.1)  are  about  the  same  as  those  for  the 
logistic  model  with  covariates  concentration  and  time  of  sacrifice. 

7.  Logistic  Model:  Time  as  Covariate;  High-Sacrifice  Time  Prediction 

A  logistic  model  with  probability  of  at  least  one  neoplasm  occurring  before 
time  t  of  the  form 

is  estimated  using  data  for  fish  sacrificed  at  t  =  4,  6  months  for  each  of  the 
concentrations  2.5,  5.0, 10.0,  and  20.0  mg/L.  The  resulting  models  are  then  used 
to  predict  the  probability  for  t-9  months.  The  results  using  data  from  the 
experiment  with  fish  of  age  52  days  at  the  start  of  the  experiment  appear  in 
Table  6.  The  predicted  probabilities  are  always  larger  than  the  observed  fraction. 

8.  Logistic  Models  with  Other  Functional  Forms  for  the  Covariates 
Table  7  displays  results  of  using  the  model 

(8.1) 

estimated  using  data  for  5  mg/L,  10  mg/L  and  20  mg/L  to  predict  the  fraction  of 
medaka  with  at  least  one  neoplasm  for  2.5  mg/ £  treatment  group;  the  experiment 
using  fish  of  age  6  days  at  the  start  of  the  experiment  is  used.  The  model  under¬ 
predicts  the  fraction. 


^(develop  neoplasm)  = 
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Table  8  displays  results  of  using  the  model 


P(develop  neoplasm)  =  ^  ^  log  d 

estimated  using  data  for  5.0  mg/L,  10  mg/L  and  20  mg/L  to  predict  the  fraction 
of  medaka  with  at  least  one  neoplasm  for  the  2.5  mg/ 1  treatment  group,  using 
the  experiment  started  with  fish  6  days  of  age.  The  predicted  fractions  are  within 
2  standard  errors  of  the  observed. 

9.  Summary 

Incidences  of  hepatocellular  neoplasms  (adenomas  and/or  carcinomas)  as  a 
function  of  concentration  and  sacrifice  time  in  the  mega-medaka  study  are 
explored.  Graphical  displays  suggest  that  the  log  odds  of  at  least  one  neoplasm 
not  occurring  is  roughly  linearly  decreasing  with  concentration  and  time  of 
sacrifice.  Graphical  displays  also  suggest  that  the  age  of  the  medaka  at  the  start 
of  the  experiment  has  an  effect.  The  medaka  that  are  of  age  6  days  at  the  start  of 
the  experiment  tend  to  have  higher  incidence  of  at  least  one  neoplasm  than  those 
of  age  52  days  at  the  beginning  of  the  experiment. 

The  logistic  regression  model  is  used  to  explore  incidences  of  hepatocellular 
neoplasms  as  a  function  of  concentration  and  sacrifice  time.  The  estimated 
logistic  regression  models  for  fish  of  age  6  indicate  that  a  positive  concentration 
increases  the  probability  of  at  least  one  neoplasm  occurring.  A  positive 
concentration  also  increases  the  probability  of  at  least  one  neoplasm  developing 
as  the  time  of  sacrifice  increases.  However,  there  is  no  apparent  association 
between  the  size  of  the  concentration  and  the  probability  increases  as  a  function 
of  sacrifice  time. 

A  statistical  model  can  not  only  be  used  to  describe  data  but  also  be  used  to 
predict  data.  We  explore  the  ability  of  logistic  regression  models  with  different 
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covariates  estimated  with  one  part  of  the  data,  to  predict  the  other  part.  A 
summary  of  the  results  follows. 

The  prediction  of  the  fractions  of  medaka  with  at  least  one  neoplasm  at 
2.5  mg/L  concentration  and  the  control  using  logistic  regression  models 
estimated  using  the  data  from  higher  concentrations  is  explored;  the  model 
covariates  are  concentration  and  time  of  sacrifice.  The  prediction  results  using 
logistic  models  are  better  for  2.5  mg/L  than  those  for  the  control.  The  prediction 
results  for  smaller  sacrifice  times  are  better  than  those  for  larger  sacrifice  times. 
The  predicted  fractions  tend  to  be  larger  than  the  estimated  fractions  for  the 
covariates  of  concentration  and  time  of  sacrifice.  Models  with  covariates  (time  of 
sacrifice)  and  (concentration  x  time  of  sacrifice)  were  also  considered.  The 
prediction  results  were  similar. 

The  prediction  of  the  fraction  of  medaka  developing  at  least  one  neoplasm 
before  the  sacrifice  time  at  f  =  9  using  a  model  fit  using  other  data  is  explored;  the 
covariates  are  concentration  and  time  of  sacrifice.  The  models  fit  using  data  from 
positive  concentrations  do  not  predict  the  fraction  from  the  control  group  well; 
the  models  tend  to  overpredict  the  fraction.  The  models  fit  using  data  for  f  =  4, 6 
within  a  concentration  also  tend  to  overpredict  the  fraction  at  t  =  9. 

The  prediction  of  the  fraction  of  fish  developing  at  least  one  neoplasm  at 
2.5  mg/L  concentration  using  a  logistic  model  with  covariates  log  concentration 
and  log  time  of  sacrifice  estimated  with  data  at  larger  concentrations  is  explored 
using  the  experiment  started  with  fish  6  days  old.  The  predicted  fractions  are 
smaller  than  the  observed  fractions. 
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Table  1 
Age  6 

Model:  The  number  of  fish  with  neoplasms  has  a  binomial  distribution  with 
probability 

where  f  is  the  time  of  sacrifice 

Estimates  jSg 

(std  errors  (Fisher  Information)) 


Tanks 

Treatment 

A) 

1 

A 

A) 

2 

A 

A) 

3 

A 

A) 

4 

A 

All 

A)  A 

control 

11.5 

-0.11 

25.7 

-2.7 

11.6 

-0.14 

11.0 

-0.02 

28.6 

-2.8 

(93) 

(14) 

(73) 

(8.1) 

(94) 

(14) 

(90) 

(13) 

(107) 

(11) 

fraction  of 

0 

0 

0 

0 

0 

2 

0 

0 

0 

0 

0 

0 

medaka 

25 

25 

14 

24 

24 

11 

25 

25 

12 

25 

25 

23 

2.5  mg 

6.1 

-0.53 

3.4 

-0.22 

6.8 

-0.71 

29.7 

-3.2 

6.0 

-0.56 

(2.1) 

(0.27) 

(1.4) 

(0.21) 

(1.9) 

(0.22) 

(114.8) 

(12.8) 

(0.92) 

(0.12) 

0 

2 

3 

0 

6 

1 

1 

1 

8 

0 

0 

6 

25 

25 

16 

24 

25 

11 

25 

25 

19 

25 

25 

21 

5.0  mg 

8.8 

-0.90 

5.5 

-0.64 

4.4 

-0.47 

6.4 

-0.81 

5.6 

-0.63 

(2.9) 

(0.35) 

(1.3) 

(0.18) 

(1.2) 

(0.17) 

(1.4) 

(0.20) 

(0.71) 

(0.09) 

0 

1 

6 

2 

3 

11 

1 

6 

6 

0 

6 

10 

25 

25 

19 

24 

25 

18 

25 

25 

14 

25 

24 

15 

10.0  mg 

5.0 

-0.66 

4.1 

-0.62 

6.0 

-0.76 

4.2 

-0.66 

4.58 

-0.64 

(1.18) 

(0.172) 

(1.05) 

(0.17) 

(1.4) 

(0.20) 

(1.10) 

(0.18) 

(0.56) 

(0.09) 

3 

5 

12 

4 

11 

13 

0 

7 

9 

3 

13 

11 

25 

25 

16 

25 

25 

16 

25 

25 

14 

25 

25 

14 

20.0  mg 

6.3 

-1.3 

2.2 

-0.42 

5.7 

-1.2 

2.0 

-0.46 

3.4 

-0.69 

(1.6) 

(0.33) 

(0.95) 

(0.16) 

(1.6) 

(0.32) 

(0.97) 

(0.17) 

(0.58) 

(0.11) 

7 

21 

18 

8 

16 

10 

6 

19 

7 

11 

18 

14 

25 

25 

18 

25 

25 

13 

7A 

25 

7 

25 

25 

16 

The  labeling  of  tank  number  in  the  table  sets  the  smallest  tank  number  in  a 
treatment  group  equal  to  1, . . and  the  largest  tank  number  in  a  treatment  group 
equal  to  4. 

The  fraction  of  medaka  has  as  numerator  the  number  of  fish  with  combined 
hepatocellular  neoplasms  and  as  denominator  the  number  of  fish  livers 
examined.  The  fractions  from  left  to  right  for  each  tank  are  for  4  months  sacrifice, 
6  months  sacrifice,  and  9  months  sacrifice. 
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Table  2 

Predicted  fractions  of  medaka  with  at  least  one  neoplasm  occurring  by  time  t  for 
control  group  and  concentration  2.5  mg/L  group  using  model  estimated  with 
data  for  higher  concentrations. 

Model  with  Covariates:  Time  of  Sacrifice,  Concentration 
Medaka  Aged  6  days  at  Start  of  Experiment 
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Table  2a 

Predicted  fractions  of  medaka  with  at  least  one  neoplasm  occurring  by  time  f  for 
control  group  and  concentration  2.5  mg/L  group  using  a  logistic  regression 
model  estimated  using  data  for  higher  concentrations. 

Model  with  Covariates:  (Time  of  Sacrifice) 

((Time  of  Sacrifice  )  x  Concentration) 


Medaka  Aged  6  days  at  Start  of  Experiment 


Time 

Predicted 
Fraction  for  2.5 
mg/L  treatment 
group 

[bootstrap  s.e.] 

Fraction  of 
Medaka  in 
treatment  group 
2.5  mg/L  having 
the  symptom 

Predicted 

Fraction  for 
Control  Group 

Fraction  of 
Medaka  in 
Control  Group 
having  the 
symptom 

4 

0 

- 

6 

0.12 

0.09 

0.08 

0 

(0.04) 

[0.03] 

- 

9 

0.27 

■■nsniiilliiilliii 

(0.05) 
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Predicted  fractions  of  medaka  with  at  least  one  neoplasm  occurring  by  time  t  for 
control  and  concentration  2.5  mg/L  using  model  estimated  with  data  for  higher 
concentrations. 


Model  with  Covariatess  Time  of  Sacrifice  and  Concentration 
Medaka  Aged  52  days  at  Start  of  Experiment 


j  . . 

Predicted 
Fraction  for  2.5 
mg/L  treatment 
group  from 
Estimated  Model 
(standard  error) 
[bootstrap  s.e.] 

Fraction  of 
Medaka  in 
treatment  group 
2.5  mg/L  having 
the  symptom 
(standard  error) 

Pred.  Frac.  for 
control  from  Est. 
Model 

(standard  error) 
[bootstrap  s.e.] 

Fraction  of 
Medaka  in 
control  having 
symptom 

(standard  error) 

4 

0.02 

(0.01) 

[0.02] 

0.01 

(0.01) 

0.02 

(0.01) 

6 

0.04 

0 

(0.02) 

- 

9 

0.14 

0.12 

0.02 

(0.04) 

(0.04) 

(0.01) 

[0.04] 
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Table  3a 

Predicted  fractions  of  medaka  with  at  least  one  neoplasm  occurring  by  time  t  for 
control  group  and  concentration  2.5  mg/L  group  using  model  estimated  with 
data  for  higher  concentrations. 

Model  with  Covariates:  (Time  of  Sacrifice) 

((Time  of  Sacrifice )  x  Concentration) 


Medaka  Aged  52  days  at  Start  of  Experiment 


Time 

Predicted 
Fraction  for  2.5 
mg/L  treatment 
group  from 
Estimated  Model 
[bootstrap  s.e.] 

Fraction  of 
Medaka  in 
treatment  group 
2.5  mg/L  having 
the  symptom 
(standard  error) 

Fred.  Frac.  for 
control  from  Est. 
Model 

[bootstrap  s.e.] 

Fraction  of 
Medaka  in 
control  having 
symptom 

(standard  error) 

4 

0.01 

(0.01) 

6 

0.04 

0 

(0.02) 

- 

9 

0.14 

0.02 

(0.04) 

(0.01) 
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Table  4 


Predicted  fraction  of  medaka  with  at  least  one  neoplasm  occurring  by  time  t  for 
the  control  using  model  estimated  with  data  for  positive  concentrations. 


Medaka  Aged  6  days  at  Start  of  Experiment 


time 

Predicted  Fraction 
for  Control  from 
Estimated  Model 
(standard  error) 
[bootstrap  s.e.] 

Fraction  of  Medaka 
in  Control  having  the 
symptom 
(standard  error) 

4 

0.02 

(0.01) 

[0.01] 

0 

6 

0.06 

0 

(0.03) 

- 

[0.02] 

9 

0.28 

0.03 

(0.07) 

(0.02) 

[0.07] 

P{at  least  one  neoplasm  by  time  t) 


1 

2  + 


Po 

p2 

Estimates 

6.6 

-0.63 

-0.18 

(std  error) 

(0.42) 

(0.05) 

(0.01) 

[bootstrap  s.e.] 

[0.41] 

[0.05] 

[0.01] 
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Table  5 


Predicted  fraction  of  medaka  with  at  least  one  neoplasm  occurring  by  time  t  for 
the  control  using  model  estimated  with  data  for  positive  concentrations. 


Medaka  Aged  52  days  at  Start  of  Experiment 


time 

Prediction  Fraction  of 
Control  with 
neoplasms 
(standard  error) 
[bootstrap  s.e.] 

Observed  Fraction 

(standard  error) 

4 

0.01 

0.02 

(0.01) 

(0.01) 

[0.01] 

6 

0.03 

0 

(0.02) 

- 

[0.02] 

9 

0.11 

0.02 

(0.04) 

(0.01) 

[0.04] 

Po 

Pi 

Pi 

Estimates 

6.16 

-0.45 

-0.12 

(std  error) 

(0.43) 

(0.05) 

(0.01) 

[bootstrap  s.e.] 

[0.44] 

[0.05] 

[0.01] 
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Table  6 


Fish  of  Age  52 


Start  of  Experiment 


The  model  is 


P(develop  neoplasm)  = 


Estimate  parameters  using  data  for  fish  sacrificed  at  t  =  4, 6. 

Predicted  proportion  of  medaka  with  at  least  one  neoplasm  for  those  sacrificed  at 
f  =  9. 


Estimates 


(standard  errors) 


7.40  -0.70 

3.2)  (0.56) 


(119) 

(all  O's  at  t=4) 


(19.8) 


-0.93 

(0.32) 


-0.82 

(0.21) 


Predicted 
Fraction  at  t=9 

(standard  errors) 


Fraction  of  Medaka 
sacrificed  at  t=9 
with  symptom 
(standard  errors) 


(0.0002) 
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Table  7 


Predicted  fraction  of  medaka  with  at  least  one  neoplasm  occurring  by  time  t  for 
2.5  mg/L  using  model  estimated  with  data  for  5  mg/L,  10  mg/L  and  20  mg/L. 

Model 


1 

P(abnormal)  = - — rr-rr 

l  +  ea+blos{dxt) 


Logistic 

a 

Bootstrap 

Logistic 

b 

Bootstrap 

Estimates 

9.35 

9.37 

-2.16 

-2.16 

(std  error) 

(0.71) 

[0.70] 

(0.17) 

[0.16] 

Medaka  Aged  6  days  at  Start  of  Experiment 


time 

Predicted  Fraction 

Fraction  of  Medaka 

for  2.5  mg/L  from 

for  2.5  mg/L  having 

Estimated  Model 
[bootstrap  s.e.] 

the  symptom 

4 

0.01 

0.01 

[0.01] 

6 

0.03 

0.09 

[0.01] 

9 

0.07 

0.27 

[0.03] 
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Table  8 

Predicted  fraction  of  medaka  with  at  least  one  neoplasm  occurring  by  time  t  for 
2.5  mg/L  using  model  estimated  with  data  for  5  mg/L,  10  mg/L  and  20  mg/L. 
Model 

_ 1 _ 

P(abnormal)  =  1  + 

Medaka  Aged  6  days  at  Start  of  Experiment 


ih  \  h 

Logistic  Bootstrap  Logistic  Bootstrap  Logistic  Bootstrap 


-4.07  -4.11  -1.9 

(0.34)  [0.36]  (0.1 


Medaka  Aged  6  days  at  Start  of  Experiixient  Medaka  Aged  52  days  at  Start  of  Experiment 


Time 

4 

Predicted 

Fraction  for  2.5 
mg/L  from 
Estimated  Model 
[bootstrap  s.e.] 

Fraction  of 
Medaka  for  2.5 
mg/L  having  the 
symptom 
(standard  error) 

Predicted 

Fraction  for  2.5 
mg/L  from 
Estimated  Model 
[bootstrap  s.e.] 

DDI 

■||||Q^9H|| 

■■ 

6 

0.09 

0.02 

0.04 

(0.04) 

(0.02) 

9 

0.20 

0.27 

||||||||||||H 

0.14 

[0.07] 

(0.05) 

(0.04) 
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Appendix  A 


Standard  Errors  for  the  Predicted  Fraction  of  Medaka 
Which  Have  at  Least  One  Neoplasm 

1.  Asymptotic  Variance  for  Estimated  Probability  of  at  Least  One  Neoplasm 
Occurring 

The  logistic  model  for  the  probability  of  at  least  one  neoplasm  occurring  is 


where 


e{p;x) 


1 

l  +  X{P;x) 


'  V 
.1=1 

with  covariates  (xi, xp). 

Assuming  the  model  is  correct,  let  pf  be  the  true  value  of  pi. 

An  approximate  asymptotic  variance  for  x{^P;x^  is  based  on  the  first  two 
terms  of  a  Taylor  expansion 

/=1 

im-.xixjpj-i^) 

,/=i 

y=l 

-  ^X(^p;xfxj  Var{pj)+^'Y,xl^p;xf Xi^xfov{pj,^'j 

y=l 
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An  asymptotic  variance  for  9^^;  xj  is 


Var 


9\p;x 


r  ^2 

1 


Var 


J 

^2 


J,Var[Pj]xj  +  2  J^XjX!,Cov[p,,Jj) 
j-i  i=ik^i 


=  p  l-pl 


?(^;x)] 


+  S  'Z^jXkCov(Pk,Pj) 


./=! 


Asymptotic  estimates  of  the  variance  and  covariance  of  Pj  can  be  obtained  using 
Fisher  Information. 


2.  Asymptotic  Variance  for  Predicted  Fraction  of  Sacrificed  Fish  with  at  Least 
One  Neoplasm  Occurring 

To  obtain  a  variance  for  the  predicted  fraction  of  sacrificed  fish  with 
neoplasms  let  M  be  the  number  of  medaka  sacrificed,  N  be  the  number  of 
medaka  having  at  least  one  neoplasm,  and  9  be  the  (unknown)  probability  of  at 
least  one  neoplasm  occurring  by  the  time  of  sacrifice.  The  variance  of  the  fraction 
of  medaka  that  have  one  or  more  neoplasms  is 
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=  e|^^M0(1-6>)  +Var[9] 

=  ^|^E[0]-{yflr[0]  +  E[0^]|  +Var[d] 

Thus  an  (asymptotic)  estimate  of  the  variance  of  the  predicted  fraction  of 
medaka  having  the  symptom  is 
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Bootstrap  assessments  of  estimate  standard  errors  and  predictive  proportions 
use  simulation.  The  original  data  (mi,  ni,  Xii,...,Xip)  i=  1, I  are  used  to 
estimate  the  parameters  of  the  logistic  model;  Pi ,  i  =1,  A  bootstrap 
replication  consists  of  the  following.  For  each  i  =  1, I,  a  binomial  random 


P 

1  +  exp- 

Pi 

J'=l 

n-1 


number  fcf  with  mi  trails  and  probability  of  success  |  i  +  exp-;  >,x,v  bJ' M  is 

drawn.  The  simulated  data  (mi,  hi,  Xi\, . . .,  x/p)  are  used  to  estimate  the  parameters 

of  the  logistic  model,  Pj ,j  =  l,...,p.  These  bootstrap  estimates  are  then  used  to 

predict  the  fraction  of  medaka  that  have  neoplasms  out  of  K  fish  sacrificed  that 

have  covariates  yi, . . .,  yp  in  the  following  manner.  A  binomial  random  number  is 

r  r  ^ 1-1 

IP 

drawn  having  K  trials  and  probability  of  success  1  +  exp- 


Ivi  Pi 
M 


.  The 


random  number  divided  by  the  number  of  trials  gives  a  bootstrap  replication  of 

the  predicted  fraction.  This  ends  one  bootstrap  replication. 

The  bootstrap  estimate  of  standard  error  of  an  estimate  is  the  square  root  of 
the  sample  variance  of  the  bootstrap  estimate.  The  bootstrap  standard  error  for 
the  prediction  proportion  is  the  square  root  of  the  sample  variance  of  bootstrap 
predicted  fractions. 

The  bootstrap  results  reported  use  500  bootstrap  replications. 
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Incidences  of  Hepatocellular  Neoplasms  (Adenoma[s]  and/or  Carcinoma[s])  Combined. 
Numerator  is  Number  of  Fish  with  Combined  Hepatocellular  Neoplasms. 
Denominator  is  Number  of  Fish  (Livers)  Examined. 


Appendix  C 


>h  at  6  Days  of  Age  at  Start  of  Test 

20.0  mg/L  Groups 

09 

11/25 

18/25 

CD 

CO 

CO 

_ 

6/24 

19/25 

25 

8/25 

16/25 

10/13 

- 

7/25 

21/25 

CO 

CO 

10.0  mg/L  Groups 

3/25 

13/25 

11/14 

58 

0/25 

7/25 

o5 

▼— 

4/25 

11/25 

13/16 

CM 

CO 

3/25 

5/25 

12/16 

5.0  mg/L  Groups 

CM 

CD 

L. ...... 

0/25 

6/24 

10/15 

42 

in 

CM 

6/25 

B 

h- 

CM 

CM 

3/25 

11/18 

CD 

0/25 

1/25 

6/19 

LL 
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CM 

CO 
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B 

Q. 

3 
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CM 
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(0 
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in 

CM 

3 

in 

CM 

CM 

E 

B 

o 

B 

O 

“o 
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CM 

CM 
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CM 

B 

B 
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(O 

CO 

CO 

w 

o 

_c 

sz 

13 

c 

c 

c 

o 

CL 

o 

2 

O 

o 

LU 

CD 

CD 

Fish  at  52  Days  of  Age  at  Start  of  Test 

20.0  mg/L  Groups 

57 

2/25 

6/25 

12/19 

34 

4/24 

9/25 

11/20 

20 

2/25 

10/25 

15/23 

CM 

0/25 

6/25 

10/17 

10.0  mg/L  Groups 

CO 

in 

6/25 

6/22 

40 

1/25 

4/25 

CO 

CM 

B 

23 

1/24 

4/25 

3/21 

eo 

0/24 

3/25 

CM 

5.0  mg/L  Groups 

63 

0/24 

1/25 

1/23 

37 

0/25 

4/25 

3/23 

26 

0/25 

3/25 

B 

r^ 

o 

0/25 

4/25 

CO 

CM 

2.5  mg/L  Groups 

CD 

0/25 

2/25 

CD 

43 

0/24 

0/25 

CO 

CO 

29 

0/25 

0/25 

CM 

CO 

o 

1/25 

2/25 

CM 

CM 

Control  Groups 

52 

1/24 

0/25 

1/24 

39 

0/24 

0/25 

0/24 

30 

1/25 

0/24 

CM 

CM 

B 

o 

0/25 

0/25 

Time  Post 

Initial 

Exposure 

4  Months 

6  Months 

9  Months 
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FRAC  OF  MEOAKA  WiTH  AT  LEAST  \  NEOPUSM/CARCINOMA 
AGE  8  DA\3  AT  START  OF  EXPERIMENT 


SVnONIOyVO/Si^SVld03N  WilM  AO  OVHJ 


fl 

UJ 

2 


O 

CM 

H. 

M 

00 

cn 


DOSE  dose 


FRAC  OF  UCOAKA  WITH  AT  LEAST  i  NEQPUSU/CARCWOMA 
AOE  «  DAYS  AT  START  OF  aPERIMEWT 


IE  OF  SACRinCE 


FRAC  OF  MEOAKA  WITH  AT  LEAST  I  NEOPUSM/CARCINOMA 
AGE  52  OATS  AT  START  OF  EXPERIMENT 


9'0  t‘0  0 

s^mmo^NO/sns^idOin  hum  v>{VQ3n  jo  ovhj 


g’O  t’O  Z'O  0 

SVF>iON10^3/Si^SVldO3W  HilM  AO  OVyj 


g*0  t’O  3*0  0 

SVNONIOdVO/SHSVld03N  HUM  VMVa3J^  30  0Vd3 


An  Exploratory  Analysis  of  Data  from  a  Mega-Medaka  Study 


DOSE 

Fiftiue 


FBAC  or  UeOAKA  Wini  AT  lEAST  I  NEOPUSM/CARONOMA 
ACE  52  OATS  AT  STAR!  Of  EXPERIMENT 


e  OF  SACRIFICE 


LOG  OOOS  OF  NO  NEOPLASMS/CARCINOMAS  OCCURRING 
MEDAKA  OF  AGE  6  DAYS  AT  START  OF  EXPERIMENT 


LOG  ODDS  OF  NO  NEOPUSMS/CARCINOMAS  OCCURRING 
MEDAKA  OF  AGE  6  DAYS  AT  START  OF  EXPERIMENT 


DOSE  POSE 


LOG  ODDS  OF  NO  NEOPUSMS/CARCINOMAS  OCCURRING 
MEOAKA  OF  AGE  52  DAYS  AT  START  OF  EXPERIMENT 


celt 
ONlWAHflS  JO  SOGO  001 


OMEAiAHnS  JO  soao  001 


owyvtAHns  JO  sooo  ooi 


£  z  ^  ^ 

OWtAJAHnS  JO  SOOO  SOI 


C  z  I 
OJ^MAMnS  JO  SOtlO  001 
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11ME  OF  SACRIFICE 


LOG  ODDS  OF  NO  NEOPLASMS/CARCINOMAS  OCCURRING 
MEDAKA  OF  AGE  52  DAYS  AT  START  OF  EXPERIMENT 


dose  dose 


Here  are  some  questions,  areas  of  investigation,  and  statistical  approaches  to 
the  Medaka  data  sets  (Sacrifice  B,  and  potentially  Sacrifice  D).  We  also  describe 
results  of  some  exploratory  data  analyses  performed  at  the  Naval  Postgraduate 
School. 

1.  Data  and  Data  Summaries 

(a)  There  are  missing  data,  e.g.  fish  58,  67.  There  may  be  others.  These  are 
coded  99999.  Leave  these  out;  do  not  enter  the  above  defaults! 

(b)  There  are  some  obvious  outliers,  e.g.  fish  16's  weight,  which  is 
surprisingly  low.  These  can  distort  automatic  fits  or  ANOVAs.  If  spotted  they 
can  be  removed.  We  should  investigate  outlier  causes.  Possibly  the  obvious 
outliers  should  be  revisited,  images  redone,  also  summary  numbers. 

(c)  Not  all  fish  have  5  slices  examined. 
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If  robust  statistical  methods  were  used  (unresponsive  to  aberrant  values)  both 
(a)  and  (b)  could  be  accomplished  automatically:  the  outliers  would  be 
de-emphasized.  Eventually  we  could  go  in  that  direction.  But  it  would  be  better 
to  understand  why  they  occurred  and  try  to  fix  the  situation. 


2.  Liver  Area  and  Animal  Weight 

2.1.  Some  Suggestions  for  Statistical  Analysis 

(a)  It  is  anticipated  that  fish  liver  area  should  be  roughly  positively 
correlated/ associated  with  fish  weight. 

(b)  Since  dimensionally  AREA  =  (VOLUME  )2/ 3^  and  hence  for  constant 
weight  density 

AREA  =  (WEIGHT)2/3, 

it  is  worthwhile  to  examine  plots  that  show  both  AREA  vs.  WEIGHT  and  AREA 
vs.  (WEIGHT)2/3. 

In  (c-1)  and  (c-2)  below  we  describe  results  of  exploratory  data  analysis  of 
AREA  vs.  WEIGHT.  The  analysis  is  done  on  normalized  residuals  of  area  and 
normalized  residuals  of  weight.  We  compute  the  normalized  residuals  as 
follows. 

(b-1)  Compute  mean  of  all  weights,  mean  of  all  (weights)2/3,  and  mean 
of  all  areas.  Compute  standard  deviation  (s/Var)  of  all  weights,  (weights)2/3,  and 
areas.  Compute  the  normalized  residuals  for  the  i^^  fish  (/  =  1, 2, ...,  all). 

weight  i^^  fish  -  mean(weight) 

~  sd  (weight) 

/  ,  ■.  ( (weight  z*  fish)2/3  -  mean  (weight)2/ 3  ^ 

sd  (weight)2/3  '' 

(£.cirea  z'th  fish)  -  mean  (^.areas) 

~  sd  (^.areas)  ' 


.^.area  means  liver  area. 
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(b-2)  Plotwfvs.  aj 

and  Wi(t=2f3)  vs.  af. 

(b-3)  Fit  a  straight-line  regression  to  the  residuals. 

2.2.  Results  of  Exploratory  Data  Analysis 

(c-1)  Table  1  displays  the  least  squares  regression  estimates  of  residual 
liver  area  versus  residual  weight  by  treatment  group.  When  we  did  this  for 
controls  we  got  a  high  coefficient  of  determination  or  R^,  i.e.  -0.8,  which  suggests 
a  reasonable  linear  relation;  plots  show  this  too.  Note  the  the  R^.y^lues  are 
smaller  for  those  treatment  groups  treated  with  DEN  compared  to  those  not 
treated  with  DEN  for  the  same  level  of  TCE. 

(c-2)  Figure  1  displays  scatter  plots  of  residual  liver  area  versus  residual 
weight  for  all  data  except  100  mg/£  DEN  as  separate  plots  by  treatment;  the  data 
from  all  tanks  in  a  treatment  group  are  combined;  note  that  the  plots  are  not  on 
the  same  vertical  axis.  Figures  2-7  are  plots  of  residual  liver  area  versus  residual 
weight  for  all  tanks  in  each  treatment  group.  Linear  behavior  of  the  above 
plots /regressions  degenerates  with  concentration,  particularly  of  DEN:  the 
linearity  is  not  at  all  apparent,  and  scatter  around  a  fitted  line  becomes 
comparatively  large;  see  right-hand  side  of  Figure  1  and  Figures  2-7.  Note  that 
the  plot  for  the  control  group  in  Figure  1  (upper  left-most)  is  on  a  different  scale 
than  other  plots;  one  reason  for  this  is  the  outlying  value  for  fish  21.  For  some 
fish  the  liver  areas  grow  greatly,  while  for  others  of  nearly  the  same  residual 
weight  the  residual  is  quite  small. 

Comment.  The  above  effect  of  toxin  on  liver  size  suggests  that  response  may  not 
be  a  simple  mean  effect,  but  rather  may  tend  to  reflect  the  different  effect  of  toxin 
concentration  on  different  fish,  i.e.  increase  the  variability  of  response.  This  may 
mean  that  variance,  rather  than  mean  could  be  a  useful  quantifier  of  response  to 
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nominal  concentration.  A  partial  explanation  of  the  variability  of  liver  response 
might  be  that  nominal  concentration  and  actual  dose,  at  organ  level,  are  very 
different.  We  know  of  no  way  to  investigate  this  experimentally. 

It  seems  possible  that  the  above  effect  will  be  amplified  by  longer  exposure. 
Examination  of  Sacrifice  D  data  should  be  interesting. 


TABLE  1 

Linear  Least  Squares  Regressions  for 
Residual  Liver  Area  versus  Residual  Weight 
Residual  Liver  Area  =  A  +  B  (Residual  Weight) 
Estimates 

[  ]  =  95%  normal  confidence  interval 


TREATMENT 

DEN  TCE 

(mg/^)  (mg/.£) 

A 

B 

R2 

Variance  of 
Regression 
Residual 

0 

0 

0.19 

1.2 

0.78 

0.14 

(control) 

(without  fish  16) 

[-0.06,0.44] 

[0.79,1.5] 

10 

0 

-0.39 

0.24 

0.05 

0.58 

[-0.83,0.06] 

[-0.37,0.85] 

0 

0.1 

-0.43 

0.56 

0.82 

0.10 

(1  data  point  missing) 

[-0.61,-0.25] 

[0.40,0.71] 

10 

0.1 

0.05 

0.49 

0.45 

0.40 

(1  data  point  missing) 

[-0.32,0.42] 

[0.17,0.82] 

0 

1.0 

0.13 

0.84 

0.54 

0.52 

[-0.27,0.53] 

[0.40,1.3] 

10 

1.0 

0.32 

0.41 

0.28 

0.50 

[-0.13,0.76] 

[0.03,0.79] 
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3.1.  Some  Suggestions  for  Statistical  Analysis 

(a)  The  counting  index  (Cl)  and  area  index  (AI)  both  tend  to  measure  the 
same  quantity,  namely  the  numbers  of  cells  in  S-phase  within  the  ROI.  An 
increase  in  that  number  is  supposed  to  indicate  cell  proliferation  related  to  the 
concentration,  i.e.  DEN  and  TCE  in  various  concentrations. 

(b)  The  area  index  may  tend  to  grow  (or  otherwise  change)  over  time  and 
with  concentration  because  of  the  effect  of  DEN  and/or  TCE  on  cell  size, 
represented  by  presented  (stained /black,  also  that  of  other  cells)  area  in  a  slice. 
The  number  of  stained  cells  may  also  grow,  but  the  former  effect,  on  AI,  may  be 
(i)  greater,  and  (ii)  make  it  more  variable,  as  time  and  toxin  level  increase.  There 
is  the  possibility  of  bias,  and  extra  variability.  It  might  actually  be  that  the  number 
of  cells  in  S-phase  stays  the  same  or  decreases  with  increased  concentration,  but 
the  size/area  of  those  remaining  increases  more  than  enough  to  overcome  this. 

(c)  Recommendation: 

(1)  Study  graphically  (plot)  AI  vs  Cl  for  various  concentration  levels. 

(2)  Do  so  for  all  data  combined  in  a  treatment  group,  and  by  replicate; 
the  latter  should  indicate  between-tank  variation. 

This  has  been  (partially)  done  at  BRDL.  The  indication  is  of  a  fairly  linear 
relation,  but  the  variability  increases  with  toxin  concentration. 

3.2.  Results  of  an  Exploratory  Data  Analysis 

3.2.1.  Preliminary  Study  of  Area  Index  (AI)  by  Treatment 

This  subsection  reports  results  of  a  preliminary  look  at  area  index 

area  of  positive  hepatocytes 

area  index  = - 7 - ^ 

area  of  region  of  interest 

(without  multiplying  by  100). 
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Figure  8  is  a  scatter  plot  of  residual  mean  area  of  the  region  of  interest  (ROT) 
versus  the  residual  mean  area  of  the  positive  hepatocytes  where 

residual  mean  x  r  i  area  -  Mean(x  r  i  area) 

area  of  the  region  - 

of  interest 

and 

residual  mean  area  hep  are)  -  Mean(x  hep  are) 

of  the  positive  =  - r . . . "7^" . - 

hepatocytes  ^Variance(xhepare) 

The  scatter  plot  indicates  that  there  is  no  relation  between  the  mean  area  of  the 
region  of  interest  and  the  mean  area  of  the  positive  hepatocytes.  It  also  indicates 
that  Fish  21  is  an  outlier. 

Figure  9  displays  box  plots  of  the  area  index  (mean  area  of  positive 
hepatocytes  =  x  hep  are) /(mean  area  of  region  of  interest  =  x  ri  area)  by 
treatment  group;  the  mean  is  over  the  slices;  note  that  our  area  index  is  the  ratio 
not  multiplied  by  100.  The  treatment  group  of  100  ing/£  of  DEN  was  omitted. 
The  upperside  of  the  box  is  at  the  75%  quantile  of  the  area  index.  The  lower  side 
of  the  box  is  at  the  25%  quantile.  The  mean  of  the  area  index  is  represented  as  a 
circle  in  the  box  and  the  median  is  a  line  across  the  box.  Fish  21  is  omitted. 

Figure  10  displays  box  plots  of  the  mean  area  of  positive  hepatocytes  by  the 
same  treatment  group.  Fish  21  is  omitted. 

Figures  9  and  10  are  not  that  much  different.  Both  plots  indicate  the  major 
difference  in  response  is  between  the  control  group  and  the  DEN  with  1.0  mg/£ 
group. 

Box  plots  are  summary  plots.  To  further  investigate  possible  associations  of 
area  index  and  treatment,  the  area  indices  are  plotted  against  treatments. 
Figure  11  presents  the  area  indices  versus  the  level  of  TCE  for  all  data.  The  levels 
of  TCE  are  those  found  on  the  tables.  In  particular,  fish  81-96  are  exposed  to 


.,^ariance(xr  i  area) 
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0  mg/ i  DEN  and  1.425  mg/ 1 TCE  and  fish  97-112  are  exposed  to  10.0  mg/.£  DEN 
and  1.339  mg/^  level  of  TCE.  There  does  not  appear  to  be  much  association. 

Figure  12  plots  the  area  index  versus  level  of  DEN  exposure.  Again,  no 
association  is  apparent. 

Figure  13  displays  the  area  indices  of  fish  not  exposed  to  DEN  versus  TCE 
exposure.  Again,  no  strong  association  is  apparent. 

Figure  14  displays  plots  of  area  indices  for  fish  exposed  to  DEN  versus  TCE 
exposure.  There  seems  to  be  some  association.  The  area  index  for  fish  105 
appears  lower  than  those  of  the  other  fish  in  the  10  mg/^  DEN  and  1.4  mg/.£  TCE 
group. 


Index 

In  this  subsection  we  investigate  associations  between  the  count  index  and 
area  index  by  treatment  group. 

Figure  !  5  displays  scatter  plots  of  the  count  index  versus  the  area  index  by 
treatment  group.  Table  2  displays  the  estimated  parameters  of  the  least  squares 
regressions  by  treatment  group. 

Mean  Area  of  positive  hepatocytes  _  ^  _j_  g  Count  of  positive  hepatocytes 
Mean  Area  of  Region  of  Interest  Total  Cells  Counted 

where  (count  of  positive  hepatocytes  =  smposnuc)  and  (Total  Cells  Counted  = 

tot_celi).  Note  that  the  estimates  of  A  and  B  remain  statistically  the  same  across 

treatments.  All  coefficients  of  determination  (R^)  are  high  (at  least  0.7,  often 

higher)  except  for  the  data  pertaining  to  DEN  =  10,  TCE  =  0.1,  which  gives 

R2  =0.42;  this  difference  is  also  reflected  in  the  scatter  plot  for  that  treatment 

which  exhibits  more  variability  than  the  other  plots.  There  may  be  an  explanation 

based  on  experimental  aberrations.  Maybe  the  data  should  be  re-examined. 
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Figures  16-21  display  the  scatter  plots  by  treatments  with  the  linear  least  squares 
line  shown.  There  is  no  strong  evidence  that  the  association  between  the  count 
index  and  area  index  changes  by  treatment. 

TABLE  2 


Least  Squares  Regressions 
Area  Index  =  A  +  B  (Count  Index) 
[  ]  =  95%  Confidence  Interval 


TREATMENT 

DEN  TCE 

A 

B 

R2 

0 

0 

0 

[-0.003,0.003] 

0.31 

[0.27,0.36] 

0.98 

10 

0 

0 

[-0.13,.012] 

0.31 

[0.14,0.48] 

0.76 

0 

0.1 

0.003 

[-0.009,0.015] 

0.33 

[0.18,0.48] 

0.82 

10 

0.1 

0.003 

0.33 

(Missing  data  point) 

[-0.02,0.03] 

[-0.12,0.79] 

0.42 

0 

1.0 

0.002 

[-0.001,0.004] 

0.42 

[0.37,0.47] 

0.99 

10 

1.0 

0.002 

[-0.004,0.008] 

0.42 

[0.34,0.50] 

0.97 

Figure  22  displays  box  plots  of  (count  of  positive  hepatocytes)/ (total  cells 
counted)  by  treatment  group.  Figure  23  displays  box  plots  of  (mean  area  of 
positive  hepatocytes)/ (mean  area  of  region  of  interest)  by  treatment  group  for 
those  fish  for  which  cells  were  also  counted.  Figure  24  (respectively  Figure  26) 
displays  (count  of  positive  hepatocytes) /(total  cells  counted)  by  TCE  level  for 
Omg/.£  DEN  (respectively  10  mg/^  DEN).  Figure  25  (respectively  Figure  27) 
displays  (mean  area  of  positive  hepatocytes) /(mean  area  of  region  of  interest)  by 
TCE  level  for  0  mg/^  DEN  (respectively  10  mg/^  DEN)  for  those  fish  for  which 
cells  were  counted.  The  figures  show  little  association  with  treatment. 
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3.2,3«  Prelimmary  Study  of  Partial  Data  From  Sacrifice  D 

In  this  section  we  describe  results  of  a  preliminary  data  analysis  of  the  data 
from  Sacrifice  D.  The  data  considered  are  the  control  treatment  group  and  the 
(10  mg/^  DEN,  0  mg/£  TCE)  treatment  group.  Both  of  these  treatment  groups 
have  1  missing  observation. 

Figure  28  displays  a  scatter  plot  and  least  squares  straight  line  for  residual 
liver  area  versus  residual  weight  for  the  control  group.  Figure  29  is  a  similar 
display  for  the  (10  mg/^  DEN,  0mg/£  TCE)  group.  Table  3  displays  the 
parameter  estimates  for  the  least  squares  lines.  Since  the  95%  confidence  intervals 
for  the  parameter  estimates  overlap  the  two  regression  lines  are  statistically  the 

same  and  there  is  not  that  much  difference  in  residual  standard  deviations. 

Mean  Area  of  Positive  Hepatocytes 
Figure  30  displays  Mean  Area  of  the  Region  of  Interest 

treatment  with  10  mg/ £  DEN  is  associated  with  higher  and  more  variable  ratios. 

TABLE  3 

Two  Treatments  from  Sacrifice  D 
Least  Squares  Regression  Estimates  of  the  Linear  Relation 
Residual  Liver  Area  =  A  +  B  (Residual  Weight) 


TREATMENT 

A 

ESTIMATES 
(standard  errors) 

[95%  confidence  intervals] 

Residual 

B  std.  dev. 

R2 

control 

0 

0.95 

0.31 

0.91 

(0.08) 

(0.08) 

[-0.17, 0.17] 

[0.78,1.13] 

10  mg/L  DEN 

0 

0.91 

0.42 

0.83 

0  mg/L  TCE 

(0.11) 

(0.11) 

[-0.24,  0.24] 

[0.67, 1.16] 

REFERENCE 

IBM  Corporation.  A  Graphical  Statistical  System  (AGSS) 
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x-axis  -  RESIDUAL  WEIGHT  y-axis  -  RESIDUAL  LIVER  AREA 


SCATTER  PLOTS,  SSZ=94,  [  ]  =  RANGE  OF  CONDITION 


CONTROL  WITH  OUTLYING  FISH  16  OMITTED 


z 


0 


l- 


V3dV  h3A\l  1VnaiS3y 


RESIDUAL  WEIGHT 


RESIDUAL  LIVER  AREA 

-1.0  -0.5  0  0.5  1.0 
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0  DEN  0.1  TCE 
(1  MISSING  DATA  POINT) 


RESIDUAL  WEIGHT 


RESIDUAL  LIVER  AREA 


-10  12 
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10  DEN  0.1  TCE 
(1  MISSING  DATA  POINT) 
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RESIDUAL  LIVER  AREA 


Figure  6 
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ODEN  1.0  TCE 


RESIDUAL  WEIGHT 


RESIDUAL  LIVER  AREA 


Figure  7 
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10  DEN  1.0  TCE 


RESIDUAL  MEAN  AREA  ROI 


RESIDUAL  MEAN  AREA  POS  HEP 


Figure  8 
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OUTLYING  VALUE  IS  FISH  21 


CONTROL  DEN  0.1  TCE  DEN  1.0  TOE  DEN 

OTCE  0.1  TCE  1.0  TCE 


STRIP  BOX/PERCENTILE 
without  Fish  2‘ 


CONTROL  DEN  0.1  TOE  DEN  1.0  TCE  DEN 

OTOE  0.1  TCE  1.0  TCE 
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STRIP  BOX/PERCENTILE  PLOT,  SSZ=93 

without  Fish  21 


LEVEL  TCE 


(AREA  POSITIVE  HEPATOCYTES)/(AREA  ROI) 

0.01  0.02  0.03  0.04  0.05 
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ALL  FISH 

SCATTER  PLOT,  SSZ=94 


(AREA  POSITIVE  HEPATOCYTES)/(AREA  ROI) 


Figure  12 
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ALL  FISH 

SCATTER  PLOT.  SSZ=94 


TCE  LEVEL 


(AREA  POSITIVE  HEPATOCYTES)/(AREA  ROI) 

0.01  0.02  0.03  0.04  0.05 
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FISH  NOT  EXPOSED  TO  DEN 
SCATTER  PLOT.  SSZ=47 


TCE  LEVEL 


(AREA  POSITIVE  HEPATOCYTES)/(AREA  ROI) 


O 

tn 


Figure  14 
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FISH  EXPOSED  TO  DEN 

SCATTER  PLOT.  SSZ=47 


CAH-tCA 

0,01  0.02  0.03  0.04  0,05 


CAH^-CA  CAH4CA 

0.020  0.025  0.030  0.033  0.040  0,045  0,01  0.02  0.03  0.04 
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0.01 
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0.02  0.03 
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\J 


Figure  15 
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SCATTER  PLOTS  FOR  SUBSAMPLES,  SSZ=^ 

(C<9990)/IND-0  (C<999D)/IND' 


0.04  0.08  0.12 

(COUNT  OF  POSITIVE  HEPATOCYTES)  /  (TOTAL  CELLS  COUNTED) 
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0.02  0.04  0.06  0.08 

(COUNT  OF  POSITIVE  HEPATOCYTES)  /  (TOTAL  CELLS  COUNTED) 


(AREA  OF  POSITIVE  HEPATOCYTES)  /  (AREA  OF  ROI) 


Figiure  17 
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10  DEN  OTCE 

SCAHER  PLOT,  SSZ 


0.06  0.08  0.10  0.12 
(COUNT  OF  POSITIVE  HEPATOCYTES)  /  (TOTAL  CELLS  COUNTED) 
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COMPARISON  OF  AREA  INDICES  IN  MED  AKA  LIVERS  FOR 
SACRinCES  AT  DIFFERENT  CONCENTRATION  AND  TIME 
COMBINATIONS  USING  DATA  AVAILABLE  5/19/94 

D.  P.  Gaver 
P.  A.  Jacobs 

EXECUTIVE  SUMMARY 

1.  Mean  area  indices  were  compared  for  early  (B)  and  late  (D)  sacrifices  within 
tanks,  so  as  to  minimize  (subtract/cancel)  additive  tank  effects! 

2.  Comparison  between  treatment  types:  DEN-TCE-Controls,  inevitably 
include  between-tank  effects:  these  have  not  been  made  as  yet;  this  is  in  part 
because  required  data  has  just  become  available  for  analysis. 

3.  Conclusion: 

The  comparison  of  area  indices  in  tanks  7  and  8  (0.1  TCE)  and  13  and  14  (10 
DEN,  1  TCE)  show  depression  of  area  index  for  longer  exposure  time  (D)  as 
compared  to  shorter  time  (B).  The  pooled  tank  versions  of  the  t-tests  (rows  4  and 
5  of  Table  2)  show  definite  significance  (0.033)  for  pooled  differences.  Tanks  7  &  8 
(0.1  TCE),  and  even  more  so  (0.002)  for  tanks  13  and  14  when  a  prominent  outlier 
(Fish  156)  is  removed. 

This  negative  tendency  is  enhanced  by  comparison  with  the  control  tanks: 
Table  3,  rows  4  and  5  (with  outlier  removed). 

4.  Additional  Comments 

Data  behavior  on  individual  slices  for  fish  suggests  that  summarization  by  a 
robust  procedure  (a  mid-mean  or  broadened  median  vs.  a  mean)  would  be 
effective  and  appropriate.  Also,  a  considerable  variability  between  fish 
experiencing  the  same  treatment  (tanks)  is  apparent,  and  should  be  adjusted  for 
by  use  of  a  more  sophisticated,  e.g.  hierarchical,  model.  Within-fish  variability 
(across  different  ROIs)  may  well  be  expected,  and  be  an  important  measure  of 
response. 
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COMPARISON  OF  AREA  INDICES  IN  MEDAKA  LIVERS  FOR 
SACRIHCES  AT  DIFFERENT  CONCENTRATION  AND  TIME 
COMBINATIONS  USING  DATA  AVAILABLE  5/19/94 


D.  P.  Gaver 


EXECUTIVE  SUMMARY  ADDENDUM 


More  data  availability  has  made  possible  the  construction  of  the  enclosed 
graphs. 

(a)  The  hollow  center  dots  (°)  signify  mean  area  indices;  the  vertically  upper  (*) 
and  lower  (+)  marks  per  concentrations  are  upper  and  lower  95%  confidence 
limits,  using  pooled  difference  data  tank  for  tanks;  this  latter  step  tends  to 
remove  an  additive  tank  effect. 

(b)  Conclusions: 

(b-1)  Rightmost  two  columns  (0  DEN,  1  TCE  vs.  10  DEN,  1  TCE):  there  is  a  not- 
statistically  significant  (at  5%  level),  but  weakly  evident,  decrease  in  AI 
with  DEN  increase. 

(b-2)  Third  and  fourth-from-right  columns  (0  DEN,  0.1  TCE  vs.  10  DEN,  0.1 
TCE):  there  is  a  (weak)  increase  in  mean  AI  with  DEN  increase. 

(b-3)  Fifth  and  sixth-from-right  columns  (100  DEN  vs.  10  DEN;  0  TCE):  there  is 
a  weak  decrease  in  mean  AI  with  DEN  increase.  The  '^varia^e  (width  of 
confidence  limits)  seems  to  increase  with  DEN,  0  TCE. 

(b-4)  The  control  difference  (effect  of  added  time,  B  to  D,  seems  to  increase 
significantly:  0  is  below  lower  confidence  limit). 

(b-5)  The  analyses  with  and  without  the  outlier  fish  156  agree  reasonably  well. 


Comparison  of  Area  Indices  in  Medaka  Livers  for  Sacrifices  at  Different  Dose  Time.. . 


3-2 


1.  The  Experimental  Design 

Japanese  Medaka  are  exposed  to  differing  levels  of  DEN  and  TCE.  Each 
treatment  has  two  tanks.  Eight  animals  in  each  tank  are  sacrificed  4  Aug  1993; 
this  is  sacrifice  B.  Eight  additional  animals  in  each  tank  are  sacrificed  on  20  Aug 
1993;  this  is  sacrifice  D. 

Each  sacrificed  fish  is  exposed  to  BrdU  for  72  hours  prior  to  sacrifice;  any  cell 
that  is  in  S-phase  during  this  time  will  have  a  BrdU  marker.  Each  sacrificed  fish 
is  frozen  and  sliced  longitudinally.  A  third  of  the  slices  are  stained  with  a 
microclonal  antibody.  This  antibody  stains  nuclei  that  have  been  in  S-phase 
black:  these  nuclei  are  called  positive. 

A  region  of  interest  (ROI)  is  marked  on  the  liver  of  a  slice;  the  ROI  is  chosen 
to  maximize  the  number  of  hepatocytes  and  minimize  the  number  of 
nonhepatocytes.  The  area  of  cells  in  the  ROI  is  measured.  The  area  of  the  positive 
nuclei  is  measured.  Most  fish  have  5  slices;  however,  some  apparently  only  have 
3  slices. 

2.  The  Data 

As  of  5/19/94,  we  have  the  following  summary  data  for  each  animal:  the  area 
of  the  ROTs  (respectively,  positive  nuclei)  added  over  the  slices,  riareasm 
(respectively  riheparea);  the  mean  area  of  the  ROTs  (respectively,  positive  nuclei) 
over  the  slices,  xriarea  (respectively  xheparea),  and  the  standard  deviation  of  the 
area  of  the  ROTs  (respectively,  positive  nuclei)  over  the  slices,  sdriarea 
(respectively  sdheparea).  The  standard  deviations  of  the  area  of  positive  nuclei  are 
missing  for  1/2  the  fish  from  tanks  3,  4,  5,  and  6  from  sacrifice  B;  all  of  these 
animals  apparently  had  3  slices  stained;  in  tanks  3,  4,  5,  and  6  of  the  areas  of 
positive  nuclei  are  actually  areas  of  only  positive  hepatocytes. 
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An  area  index  is  computed  for  each  animal.  For  animal  i 

„  xheparea 

Oj  = - - - X 100. 

xriarea 

There  are  additional  missing  observations.  Later  we  obtained  the  raw  slice  data 
for  the  experiment.  Most  fish  for  the  later  data  seemed  to  have  5  slices 
considered.  Thus,  there  may  be  some  recording  errors  in  the  xheparea  and  xriarea 
in  the  data  available  at  5/19/94. 

3.  The  Question 

Is  there  a  difference  in  area  indices  between  sacrifice  B  and  D? 

In  more  detail  we  state  this  as  follows.  We  are  in  possession  of  sampled 
values  of  area  indices  from  fish  that  have  undergone  various  treatment 
combinations,  these  being  (at  least):  x  DEN,  y  TCE,  z  Exposure  time,  t  Tank. 
Assume  that  those  sample  values  can  be  combined  for  a  given  treatment 
combination  to  provide  a  meaningful  estimate  of  a  true  area  index  associated 
with  the  treatment.  Is  there  evidence  from  the  estimates  for  consistent  difference 
between  the  true  area  indices  associated  with  treatments? 

As  of  5/19/94,  we  have  data  from  sacrifices  B  and  D  for  7  treatment  groups: 
the  control,  10  mg/£  DEN,  100  mg/£  DEN,  (0  m.g/£  DEN  with  0.1  mg/£  TCE), 
(10  mg/^  DEN  with  0.1  mg/£  TCE),  (0mg/£  DEN  with  lmg/£  TCE),  and 
(10mg/.£  DEN  with  1  mg/£  TCE).  As  a  result,  we  will  restrict  our  attention  to 
these  groups. 

4.  A  Graphical  Data  Analysis 

Figure  1  displays  the  area  indices  divided  by  100  by  tank  for  both  sacrifices. 
Figure  2  displays  box  plots  of  the  same  data.  The  area  indices  are  larger  and  have 
greater  variability  for  the  100  mg/ £  DEN  treatment  group.  There  is  the  indication 
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that  area  indices  may  tend  to  be  larger  for  sacrifice  D  in  treatment  groups  control, 
10  mg/ (.  DEN,  and  100  mg/ i  DEN.  There  is  the  indication  that  area  indices  are 
smaller  for  sacrifice  D  for  the  treatment  groups  that  are  exposed  to  TCE;  0  mg/^ 
DEN  with  0.1  mg/ 1  TCE  and  10  mg/^  DEN  with  1  mg/^  TCE. 

5.  Statistical  Analysis  Based  on  Ranks 

One  way  to  compare  two  samples  is  to  use  the  ranks  of  the  combined 
samples;  the  resulting  test  is  called  a  Mann-Whitney-Wilcoxon  test  or  a  rank  sum 
test;  Gibbons  (1985),  Mosteller  and  Rourke  (1973).  To  compute  the  statistics,  the 
two  samples  are  combined  and  ordered  from  smallest  to  largest  and  ranked;  the 
smallest  observation  is  ranked  1.  The  sum  of  the  ranks  for  each  sample  is 
computed.  The  null  hypothesis  for  the  test  is  that  the  two  samples  come  from  the 
same  distribution. 

The  area  indices  from  sacrifices  B  and  D  for  each  tank  are  combined.  The 
combined  indices  for  each  tank  are  ordered  from  smallest  to  largest  and  ranked. 
The  sums  of  the  ranks  for  data  from  sacrifice  B  (respectively  D)  are  computed. 
The  results  appear  in  Table  1. 

The  null  hypothesis  for  the  rank  sum  test  is  that  the  area  indices  from 
sacrifices  B  and  D  come  from  the  same  distribution.  Since  one  half  animals  in 
tanks  3,  4,  5,  6  for  sacrifice  B  apparently  have  3  slices  examined  while  the  other 
animals  have  5  slices  examined,  the  variability  of  the  area  indices  for  the  animals 
may  be  different. 

The  p-values  of  the  sum  of  ranks  are  less  than  0.05  for  tanks  1,  8, 13,  and  14. 
Since  tanks  13  and  14  belong  to  the  treatment  group  of  10  mg/i  DEN  and  1  mg/t 
TCE,  there  is  a  strong  indication  that  the  treatment  results  in  a  lower  index  for 
sacrifice  D  than  for  sacrifice  B.  One  tank  (tank  1)  for  the  control  has  a  significant 
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p-value,  the  other  does  not.  One  tank  (tank  8)  in  the  treatment  group  0  DEN, 
0.1  mg/ i  TCE  has  a  significant  p-value,  the  other  does  not. 

6.  Statistical  Analysis  Based  on  Moments 

Moments,  especially  means,  are  often  a  useful  summary  of  data.  In  this 

section  we  describe  an  analysis  of  the  data  using  moments. 

The  sample  mean  of  the  area  indices  j  (respectively  and  sample 

9  9 

variance  of  the  area  indices  S3  y  (respectively  sf)  y )  for  sacrifice  B  (respectively 

sacrifice  D)  for  tank  /  are  computed. 

The  difference  of  the  means 


and  an  estimate  of  the  variance  of  the  difference 


^  «B,;  «D,/ 

is  computed  where  nB,j  (respectively  no^j)  is  the  number  of  animals  for  sacrifice  B 
(respectively  sacrifice  D)  for  tank  / .  This  variance  estimate  does  not  assume  that 
the  tank  area  index  variances  are  the  same  for  sacrifices  B  and  D. 

The  difference  of  the  mean  area  index  between  sacrifices  B  and  D  over  the 
two  tanks  in  each  treatment  group  and  an  estimate  of  its  variance  is  computed; 
for  example,  tanks  1  and  2  are  used  to  obtain  a  mean  (D-to-B)  difference  for  the 
control  group  fic  and  an  estimated  variance  in  the  following  manner. 


-  2  [^1  ^2] 
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If  the  assumption  is  made  that  area  indices  for  each  tank  have  the  same 
population  variance  within  a  treatment  group,  then  an  estimate  of  the  variance  of 
the  area  index  is 


_ 1 _ 

(«B,1  +  ”B,2  +  «D,1  +  ”D,2  -  4) 


where  ubjU)  is  the  area  index  for  sacrifice  B  from  tank;  =  1, 2  (first  and  second) 
for  a  treatment  group.  An  estimate  of  the  variance  of  dj  for  this  treatment  group 
would  be 


v{c)j  = 


s2(A). 


«B,;  «D,; 

An  estimate  of  the  variance  of  the  mean  of  the  mean  differences  between  sacrifice 
B  and  D  for  the  two  tanks  of  the  control 


is 


Under  the  null  hypothesis  of  no  difference  between  the  sacrifices,  the  statistic 
has  an  approximate  t-distribution  with  approximate-conservative 

degrees  of  freedom  (4  min  (nB,i,  mb,2/  nD,h  ^0,2)  -4)- 

Under  the  same  null  hypothesis  we  assume  ji^j has  the  same 

approximate  f-distribution. 

Figure  3  displays  the  mean  differences  plus /minus  2  standard  deviations  for 
the  standard  deviations  computed  not  using  the  assumption  of  equal  variances. 

Figure  4  displays  the  mean  differences  plus /minus  2  standard  deviations 
computed  using  the  assumption  of  equal  variance  across  tanks. 
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Figures  3  and  4  appear  about  the  same.  The  statistics  for  both  computations 
appear  in  Table  2  with  the  2-sided  p-values.  The  mean  area  indices  for  the 
control,  10  mg/^  DEN,  and  0  mg/£  DEN  with  0.1  mg/^  TCE  are  significant. 

The  data  displayed  in  Figure  1  indicates  that  the  area  index  of  fish  156  in  the 
treatment  group  10  mg//  DEN  with  1.0  mg//  TCE  is  unusually  large.  Figures  5 
and  6  display  the  difference  of  mean  area  indices  of  sacrifice  B  and  D  by 
treatment  without  fish  156.  The  t-statistics  without  Fish  156  are  displayed  in 
parenthesis  in  Table  2.  Note  that  without  Fish  156  there  is  a  statistically 
significant  difference  between  the  mean  area  indices  for  sacrifice  D  and  B  for  the 
treatment  group  10  mg//  DEN  with  1.0  mg//  TCE  .  Hence  the  mean  area  index 
for  sacrifice  D  is  less  than  that  for  sacrifice  B  for  this  treatment  group. 

Table  4  displays  statistics  for  the  difference  of  mean  area  indices  between 
sacrifices  B  and  D  for  each  tank  with  2-sided  p- values.  This  Table  should  be 
compared  to  Table  1.  Comparing  the  p-values  of  both  Tables  indicates  that  while 
there  is  general  agreement  whether  or  not  there  is  a  significant  difference,  it  is 
not  complete. 

Table  3  displays  statistics  for  the  difference  of  mean  area  indices  for  sacrifices 
B  and  D  for  each  non-control  treatment  minus  the  difference  of  mean  area  indices 
for  sacrifices  B  and  D  for  the  control;  the  results  without  Fish  156  are  displayed  in 
parentheses.  The  2-sided  p- values  suggest  1)  the  0.1  mg//  TCE  with  no  DEN 
results  in  the  difference  of  the  mean  area  index  for  B  and  D  being  more  negative 
than  that  for  the  control;  2)  the  treatment  of  10  mg//  DEN  with  1  mg//  TCE 
results  in  the  mean  area  index  being  more  negative  than  that  of  the  control.  If 
Fish  156  is  removed  the  treatment  group  of  10  DEN  with  1  mg//  TCE  is  even 
more  significantly  different  from  the  control. 
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The  i-statistic  for  difference  of  mean  area  indices  between  sacrifices  B  and  D 
for  treatment  group  10  mg/^  DEN  with  0.1  mg/ £  TCE  minus  the  mean  difference 
for  the  treatment  group  0  mg/.£  DEN  with  0.1  mg/.£  TCE  is  1.89  (without  the 
assumption  of  unequal  variances)  or  2.19  (if  equal  variances  are  assumed)  with 
approximate  degrees  of  freedom  48;  the  two-sided  p-values  are  P{  I  48 1  >  1-89} 
=  0.06  and  P{  1 48 1  >  2.19}  =  0.03. 

Thus,  there  is  an  indication  of  a  significant  difference  between  the  difference 
in  mean  area  indices  between  sacrifices  B  and  D  for  treatment  with  0.1  mg/£ 
DEN  and  whether  or  not  fish  were  exposed  to  DEN;  exposure  to  DEN  increases 
the  mean  difference. 

The  f-statistic  for  difference  of  mean  area  indices  between  sacrifices  B  and  D 
for  treatment  group  10  mg/ £  DEN  with  1  mg/ £  TCE  minus  the  mean  difference 
for  the  treatment  group  0  mg/.£  DEN  with  1  mg/£  TCE  is  -1.38  (without  the 
assumption  of  unequal  variances)  or  -1.67  (with  the  assumption  of  equal 
variances)  with  an  approximate  degrees  of  freedom  48.  If  the  outlying  fish  156  is 
removed  the  t-statistics  become  -3.17  (without  equal  variance  assumption)  and 
-3.53  (with  equal  variance  assumption)  with  approximate  degrees  of  freedom  40. 
The  two-sided  p- values  for  these  statistics  are 

P{  I  48 1  <  -1-38}  =  0.17  and  P{  I  48 1  <  -1-67}  =  0.10. 

P{  I  4o  I  <  -3.17}  =  0.003  and  P{  I  4o  I  <  -3-53}  =  0.001. 

Hence,  if  fish  156  is  included  there  is  no  significance  difference  between  the 
change  in  mean  area  indices  between  sacrifices  B  and  D  for  the  treatment  groups 
having  1  mg/£  TCE  and  either  no  DEN  or  10  mg/.£  DEN.  If  fish  156  is  excluded 
then  there  is  a  significance  difference  with  the  decrease  in  mean  area  index  for 
sacrifice  D  compared  to  sacrifice  D  being  greater  for  the  treatment  group 
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10  mg/^  DEN  with  1.0  mg/^  TCE  than  for  the  treatment  group  0  DEN  with 
1.0  mg/^TCE. 
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TABLE  1 


Ranks  of  Area  Indices  for  Sacrifices  B  and  D 


Tank 

Treatment 

Number  of  Fish 

Rank 

U 

DEN 

TCE 

B 

D 

B 

D 

B 

0 

0 

8 

8 

45 

91 

0.007 

B 

0 

0 

8 

7 

56 

64 

0.198 

B 

10 

0 

8 

8 

61 

75 

0.253 

B 

10 

0 

8 

7 

56 

64 

0.198 

8 

8 

52 

84 

0.052 

8 

8 

59 

77 

0.191 

7 

0 

0.1 

8 

8 

71 

65 

0.399 

8 

0 

0.1 

7 

8 

71 

49 

0.047 

9 

7 

8 

59 

61 

>0.522 

10 

10 

0.1 

8 

7 

55 

65 

0.168 

11 

10 

1 

8 

8 

70 

66 

0.439 

12 

10 

1 

8 

8 

78 

58 

0.164 

13 

10 

8 

8 

90 

46 

0.010 

14 

10 

1 

8 

7 

79 

41 

0.047 

The  entries  labeled  p  in  the  table  are  the  cumulative  probability  from  each 
extreme  to  the  value  of  the  statistic  for  the  X-sample  for  the  given  sample  size 
m<n  (m  is  the  size  of  the  X-sample;  n  is  the  size  of  the  Y-sample).  Left-tail 
probabilities  are  given  for  Tx  ^  miN  +  1)12  and  right-tail  probabilities  for 
Tx^  ?w(N  +  l)/2  where  N  =m  +  n;  from  Gibbons  (1985). 
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ABLE  2 
.dices  B 
:h  Treat 
sh  1 


Treat 


DEN 


Different 
Variance 
Mean 
Std.  Error 


p-value 
P{ITI  >f} 


-2.23 

0.72 

1.13 

-0.97 

(-3.55) 


0.036 

0.48 

0.27 

0.34 

(0.002) 


Common 
Variance 
_  Mean 
Std.  Error 


p-value 
P{ITI  >t} 


0 

0 

24 

3.03 

0.0058 

3.09 

0.0050 

10 

0 

24 

2.98 

0.0065 

3.05 

0.0055 

1 

00 

0 

28 

0.96 

0.35 

0.96 

0.35 

0.033 


0.47 


1.13 

0.27 

-1.04 

(-3.40) 

0.31 

(0.002) 

Treatment  Approximate 
Degrees  of 


Freedom 


DEN  TCE 


10  0 


Different 
Variance 
Mean 
Std.  Error 


p-value 
P{ITI  >t} 


Common 
Variance 
_  Mean 
Std.  Error 


0.34 


p-value 
P{iri  >f} 
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TABLE  4 

Differences  of  Mean  Area  Indices  Between  Sacrifices  B  and  D 

By  Tank 
with  Fish  156 
(without  Fish  156) 
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EQUAL  VARIANCES  WITHOUT  FISH  156 


Donald  P.  Gaver 


and 

Patricia  A.  Jacobs 


Department  of  Operations  Research 
Naval  Postgraduate  School 
Monterey,  CA  93943 


1.  INTRODUCTION 

This  report  describes  an  analysis  of  cell  proliferation  data,  by  liver  slice,  from 
an  experiment  using  Japanese  Medaka.  Previous  work  has  used  summary  data 
from  the  same  experiment  (Gaver  and  Jacobs,  1994a,b).  Another  relevant 
reference  is  to  Morris  (1993).  A  brief  description  of  the  experiment  follows. 

The  medaka  are  exposed  to  differing  levels  of  DEN  and  TCE  in  tanks  of 
water.  The  treatment  groups  are:  control,  10mg/.£  DEN,  100mg/.£  DEN, 
0.1  mg/£  TCE,  (10  mg/^  DEN  with  0.1  vtx^/l  TCE),  1  mg/.£  TCE,  and  (10  mg/^ 
DEN  with  1  mg/ 1  TCE).  Each  treatment  group  has  two  replicate  tanks.  Eight 
animals  in  each  tank  were  sacrificed  on  4  August,  1993;  this  is  sacrifice  B.  Eight 
additional  animals  in  each  tank  were  sacrificed  on  20  August  1993;  this  is 
sacrifice  D. 
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Each  sacrificed  fish  was  exposed  to  BrdU  for  72  hours  prior  to  sacrifice;  any 
cell  that  is  in  S-phase  during  this  time  has  a  BrdU  marker.  Each  sacrificed  fish  is 
frozen  and  sliced  longitudinally  into  7-micron  sections.  A  third  of  the  slices  are 
stained  with  another  agent.  This  agent  stains  nuclei  with  the  BrdU  marker  black; 
these  nuclei  are  called  positive.  It  is  5  of  the  latter  stained  slices  that  are  analyzed 
subsequently. 

Five  slices  containing  a  portion  of  the  liver  are  considered  for  each  fish.  A 
region  of  interest  (ROD  is  marked  on  the  slice;  the  ROI  is  chosen  to  attempt  to 
maximize  the  number  of  hepatocytes  and  minimize  the  number  of 
nonhepatocytes  present.  The  area  of  all  of  the  hepatocytes  within  the  region  of 
interest  is  measured,  and  the  area  of  positive  nuclei  within  the  region  of  interest 
is  measured.  The  number  of  hepatocytes  in  the  ROI,  and  the  number  of  positive 
hepatocytes  in  the  ROI  were  also  counted  for  half  the  fish  in  sacrifice  B. 

A  count  measure  of  cell  proliferation,  the  count  index  (Cl)/  for  a  slice  is  the 
number  of  positive  hepatocytes  in  the  ROI  divided  by  the  number  of  hepatocytes 
in  the  ROI,  multiplied  by  100.  Evaluation  of  this  measure  is  very  labor  intensive. 
As  an  alternative,  the  following  area  index  (AI)  is  used 

Area  of  positive  nuclei  in  the  ROI 
AI  =  Area  Index  =  Area  of  hepatocytes  in  the  ROI 

The  area  index  is  easier  to  obtain;  however,  it  does  not  quantify  cells  in  S-phase 

exactly  as  does  the  Cl  since  cells  are  of  different  size,  as  are  the  areas  resulting 

from  the  slicing  process,  which  are  the  result  of  a  random  intersection  with  the 

cell. 

Figure  1  displays  a  plot  of  the  count  index  (divided  by  100)  versus  area  index 
(divided  by  100),  computed  by  slice,  for  those  fish  of  sacrifice  B  for  which  both 
measures  are  available,  along  with  a  simple  unweighted  least-squares-fitted 
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straight  line;  also  displayed  is  the  least-squares  line  equation  with  the  standard 
errors  of  the  coefficients  displayed  in  parentheses  below  the  coefficients.  There 
appears  to  be  a  satisfactory  linear  relationship  between  the  count  index  and  area 
index,  indicating  that  AI  and  Cl  are  generally  measuring  the  same  response. 
However  the  variability  of  the  area  index  increases  as  the  count  index  increases; 
this  increase  is  generally  associated  with  high  DEN  and  TCE  concentration 
levels;  its  biological  interpretation  is  not  yet  available.  It  suggests  that  cell  sizes 
become  more  variable  under  concentration. 

Figure  2  is  a  display  of  the  slice  area  indices  (divided  by  100),  by  fish,  for  the 
two  control  tanks.  Note  the  variability  between  fish  and  the  somewhat  greater 
variability  between  fish  in  tank  2  as  compared  to  those  in  tank  1. 

Several  slice  area  datum  appear  to  be  missing  or  are  of  doubtful  validity. 
These  have  been  deleted  from  analytical  consideration.  They  are  listed  in  Table  1. 
An  alternative  might  have  been  to  use  robust  statistical  procedures  throughout; 
such  procedures  automatically  down-weight  highly  discrepant  observations. 
Furthermore,  examination  of  the  weights  indicates  discrepancy  so  that 
explanations  can  be  sought.  It  seems  likely  that  robust  methods  should  be  more 
widely  used  in  environmental  toxicology.  Robust  statistical  methods  are 
discussed  seriously  in  Cox  and  Hinkley  (1974).  A  less  advanced  treatment 
appears  in  Koopmann  (1987). 

A  summary  of  the  findings  of  the  data  analysis  is  as  follows, 
a.  Available  data  from  sacrifice  B  suggests  that  there  is  a  reasonably  strong 
linear  association  between  the  count  index  and  the  area  index.  For  ease  of 
analysis  the  area  index  has  been  used  throughout. 
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TABLE  1 


Slices  Not  Considered  in  the  Data  Analysis 


MISSING  SLICES:  SACRIHCE  B 

Missing  Measurements  are  left  blank. 

ID 

No.  of 
Slices 

Tank 

Treatment 

Reason 

9273802 

AU 

8 

0.1  TCE 

Blank  field 

9273811 

AU 

9 

10  DEN, 

0.1  TCE 

Blank  field 

MISSING  SLICES:  SACRIFICE  D 

Missing  Values  appear  to  be  coded  with  0. 

ID 

No,  of 
Slices 

Tank 

Treatment 

Reason 

9273986 

All 

2 

Control 

Area  ROI  =  Positive  Area  =  0 

9285049 

All 

10 

ITCE 

Area  ROI  =  Positive  Area  =  0 

9274001 

All 

4 

10  DEN 

All  positive  areas  =  0  with  2  slices 
having  1  mit.  hep. 

9285079 

AU 

14 

10  DEN, 
ITCE 

All  positive  areas  =  0  with  3  slices 
having  1  mit.  hep. 

9285022 

1 

7 

0.1  TCE 

Area  of  ROI  =  247.16,  others  of 
order  7500 

9285043 

1 

9 

10  DEN, 

0.1  TCE 

Positive  area  of  slice  =  0; 

0  mit.  cells 

9285073 

1 

13 

10  DEN, 
ITCE 

Positive  area  of  slice  =  0; 

0  mit.  cells 

b.  There  is  evidence  that  the  variances  of  the  slice  area  indices  over  fish  exposed 
to  various  treatments  are  approximately  equal  to  the  corresponding  means. 
This  relationship  can  lead  to  misleading  conclusions  if  standard  statistical 
procedures  are  used  uncritically;  furthermore,  the  results  are  less  efficient 
than  necessary.  However,  the  variances  of  the  square  roots  of  the  slice  area 
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indices  for  fish  subjected  to  the  various  treatment  (DEN  and  TCE)  levels 
appear  approximately  constant,  i.e.  far  less  dependent  on  the  corresponding 
mean  values.  Consequently,  the  square  roots  of  the  area  indices  are  used  in 
subsequent  data  analyses.  The  underlying  reason  for  the  above  data  behavior 
is  that  positive  counts  are  rare  random  events,  hence  tend  to  be 
approximately  Poisson  distributed.  The  square  root  transformation  is  known 
to  stabilize  the  variance  of  such  counts;  see  Miller  (1986),  p.  59.  The  same 
transformation  should,  and  here  does,  stabilize  the  variance  of  count- 
associated  areas. 

c.  The  means  of  the  fish  mean  square  root  of  area  indices  are  considered  for  both 
sacrifices.  There  is  generally  more  treatment  effect  for  sacrifice  B  than  for 
sacrifice  D:  examine  p-values  in  lines  3  of  Tables  2  and  3  for  the  overall 
analysis  of  variance  indications  to  see  that  sacrifice  D  p-values  are  always 
larger  than  those  for  sacrifice  B.  Four  out  of  five  of  the  treatment  means  are 
significantly  larger  (at  the  95%  level)  than  that  for  the  control  for  sacrifice  B. 
There  is  no  significant  difference  (95%  level)  between  the  treatment  means 
and  the  control  mean  for  sacrifice  D.  For  sacrifice  B,  the  treatment  means  for 
two  out  of  the  three  levels  of  TCE  with  10  mg/ 1  DEN  are  significantly  larger 
than  those  without  DEN  for  sacrifice  B;  there  is  no  significant  difference  for 
sacrifice  D. 

d.  The  control  mean  for  sacrifice  D  is  significantly  larger  than  the  control  mean 
for  sacrifice  B.  The  other  treatment  means  for  sacrifice  D  are  not  significantly 
different  than  those  for  sacrifice  B. 
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e.  There  is  some  suggestion  that  in  the  later  sacrifice  D,  the  presence  of  TCE 
lowers  the  mean  of  the  area  index.  The  biological  mechanisms  hkely  to  explain 
this  behavior  are  not  yet  available. 

The  above  results  are  from  one-way  analysis  of  variance  (ANOVA)  of  the 
square  roots  of  the  area  indices,  augmented  by  multiple  comparison  methods;  the 
latter  allow  all  possible  pairwise  comparisons  to  be  made  with  a  specified 
experiment-wise  error  rate,  here  5%.  Two  multiple  comparison  methods  were 
used:  simultaneous  confidence  intervals  using  Studentized  range  distribution 
and  Studentized  maximum  modulus  confidence  intervals. 

Brief  Overall  Summary  of  Findings  to  Date 

The  above  can  be  briefly,  and  simplistically,  summarized  as  follows.  While  it 
can  be  said  that  there  is  a  statistically  significant  difference  between  mean 
responses  (V^)  to  the  various  treatments  with  DEN  and  TCE,  no  simple  and 
interpretable  dose-response  patterns  have  been  found.  In  particular,  response 
does  not  appear  to  increase  (or  decrease)  systematically  with  dose  increase, 
where  "dose"  includes  time  of  exposure  as  well  as  increases  in  chemical 
concentration  levels.  It  remains  to  be  seen  whether  the  latter  inconclusivity  is 
lessened  by  the  analysis  of  more  data  (later  sacrifices),  by  finding  that 
experimental  problems  or  biases  occurred,  or,  more  exciting,  that  the  dose- 
responses  observed  can  be  explained  by  biological  mechanism,  and  that  the 
findings  essentially  reappear  when  further  experiments  and  data  analyses  are 
conducted. 

Section  2  presents  results  of  graphical  displays  of  the  data.  Section  3  presents 
results  of  exploratory  analyses  of  variance.  Results  of  exploratory  linear 
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regression  models  are  presented  in  Section  4.  Multiple  comparison  results  appear 
in  Section  5. 


2.  GRAPHICAL  SUMMARIES  OF  THE  SLICE  AREA  INDICES 

Figure  3a  (respectively  3b)  displays  boxplots  of  the  slice  area  indices  by  tank 
for  sacrifice  B,  (respectively  sacrifice  D).  The  upper  side  of  the  box  is  at  the  75% 
quantile  of  the  tank  area  indices.  The  lower  side  of  the  box  is  at  the  25%  quantile 
of  the  tank  area  indices.  The  circles  are  the  means  and  the  middle  bar  is  the 
median.  The  boxplots  may  be  viewed  as  a  graphical  one-way  analysis  of 
variance.  There  is  some  tank  effect  within  a  treatment  group.  The  greatest  dose- 
response  effect  is  clearly  for  the  100  DEN  treatment. 

Figures  4a  and  4b  display  plots  of  the  mean  of  the  area  indices  for  each  tank 
versus  the  variance  of  the  area  indices  for  each  tank.  Also  displayed  is  a  45°  line. 
There  appears  to  be  a  linear  relationship  between  the  mean  and  variance;  in  fact, 
the  variances  appear  to  be  approximately  equal  to  the  means.  This  relationship 
between  the  means  and  variances  may  lead  to  misleading  results  if  analysis  of 
variance  techniques  are  applied  directly  to  the  slice  area  indices;  Miller  (1986) 
and  Box  (1954)  discuss  the  effects  of  inequality  of  variance  on  one-way  analysis 
of  variance.  It  is  believed  that  such  effects  may  well  appear  often  in 
environmental  toxicology,  particularly  where  counts,  or  count-like  phenomena, 
are  found. 

As  noted  earlier,  one  standard  transformation  that  can  be  applied  to  data  with 
variances  approximately  equal  to  means  to  attempt  to  make  the  variance  of  the 
transformed  data  more  nearly  constant  is  the  square  root  transformation;  see  Miller 
(1986).  Figures  5a  and  5b  display  plots  of  the  mean  of  the  square  root  of  the  area 
indices  for  each  tank  versus  the  variance  of  the  square  root  of  the  area  indices  for 
each  tank.  The  variances  now  appear  unrelated  to  the  corresponding  means. 
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Figures  6a  and  6b  display  boxplots  of  the  square  roots  of  the  slice  area  indices  by 
tank.  Note  that  the  lengths  (heights  between  quartiles)  of  the  boxes  are  less 
variable  than  are  those  for  the  boxplots  of  the  raw  area  indices  themselves.  In  the 
remainder  of  this  paper  the  square  root  of  area  indices  will  be  used. 

3.  EXPLORATORY  ANALYSES  OF  VARIANCE 

Results  of  exploratory  analyses  of  variances  appear  in  Tables  2-4.  The  basic 
data  are  summaries  of  area  indices  and  the  square  roots  of  area  indices  for  each 
fish.  Since  the  boxplots  indicate  that  the  100  DEN  treatment  is  associated  with 
much  larger  area  indices,  the  analyses  of  variances  were  done  with  and  without 
100  DEN.  Table  2  shows  the  results  by  tank.  Recall  that  small  p-values  will 
indicate  tank,  hence  treatment,  effect.  The  first  row  of  the  table  indicates  that  the 
tank  means  of  the  fish  mean  area  index  are  significantly  different.  However  the 
second  row  of  the  table  indicates  that  the  tank  means  of  the  logarithms  of  the 
variance  (log  variance)  of  area  indices  for  each  fish  are  also  significantly  different. 

TABLE  2 

p-Values  for  Exploratory  ANOVA  of  Area  Indices  (AI)  by  Tank 
(AI  =  [(Positive  Area)/ROI  Area]  x  100) 

(Small  p-Values  Indicate  Chemical-Tank  Effect) 


Sacrifice  B 

Sacrifice  D  j 

Data 

with 

100  DEN 

without 
100  DEN 

with 

100  DEN 

without 
100  DEN 

Mean  slice  AI  for  each  fish  in  a 
tank 

<  10-16 

3  X  10-5 

6  X  10-16 

2.9  X  10-2 

Log  variance  of  slice  AI  for 
each  fish  in  a  tank 

4  X  10-11 

2  X  10-5 

3  X  10-2 

0.54 

Mean  slice  for  each  fish  in 

a  tank 

4  X  10-16 

8  X  10-7 

1  X  10-14 

1.6  X  10-2 

Log  variance  of  slice  for 

each  fish  in  a  tank 

0.16 

0.25 

0.89 

0.97 
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The  graphical  analysis  has  already  indicated  this  difference.  Now  recall  that  one 
of  the  assumptions  of  analysis  of  variance  is  that  the  data  come  from  populations 
with  equal  variance,  so  apply  the  square-root  transformation.  Rows  3  and  4 
report  results  using  the  square  roots  of  the  area  indices.  Note  that  the  results  of 
row  4  indicate  that  there  is  no  significant  difference  between  the  tank  means  of 
the  log  variance  of  the  slice  square  root  of  area  index  for  each  fish.  However,  the 
results  of  row  3  indicate  that  there  is  significant  difference  between  the  tank 
means  of  the  mean  of  the  slice  square  root  area  indices  for  each  fish;  note  that  the 
p-values  for  the  tank  means  are  smaller  using  the  fish  mean  square  root  of  area 
indices  rather  than  the  raw  (untransformed)  area  indices  for  the  analysis  of 
variance  (without  the  100  DEN  treatment).  Thus,  the  difference  in  variances  of 
the  raw  (untransformed)  area  indices  appears  to  have  masked  some  of  the 
difference  in  means,  presumably  resulting  from  treatment  effects. 

Table  3  displays  results  for  analyses  of  variance,  with  the  two  tanks  in  each 
treatment  combined.  Once  again,  the  analysis  of  variance  using  the  mean  slice 
square  root  of  the  area  indices  for  each  fish  indicates  that  there  is  a  significant 
difference  between  treatment  means  of  the  means,  even  without  100  DEN. 

Table  4  displays  results  of  an  analysis  of  variance  of  the  mean  square  root  of 
area  indices  for  each  fish  in  a  treatment  but  without  the  control.  Once  again  there 
is  evidence  of  significant  differences  between  the  treatment  means. 

We  conclude  that  there  is  a  definite  treatment  effect,  i.e.  response  to  different 
treatment  levels,  even  if  no  treatment  =  control  and  the  "strong"  100  DEN 
treatment  responses  are  removed. 


Modeling  and  Statistical  Analysis  of  Bioassay  Data:  Medaka  Cell  Proliferation. . . 


4-9 


TABLE  3 

;?-Values  for  Exploratory  ANOVA  of  Area  Indices  (AI)  by  Treatment 
(AI  =  [(Positive  Area)/ROI  Area]  x  100) 

Treatment:  2  Tanks  Combined 
(Small  p-Values  Indicate  Treatment  Effect) 


Sacrifice  B 

Sacrifice  D  | 

Data 

with 

100  DEN 

without 
100  DEN 

with 

100  DEN 

without 
100  DEN 

Mean  slice  AI  for  each  fish  in  a 
treatment 

1  X  10-16 

1  X  10-5 

1  X  10-16 

2  X  10-2 

Log  variance  of  slice  AI  for 
each  fish  in  a  treatment 

4  X  10-13 

3x10-6 

1  X  10-4 

0.33 

Mean  slice  ^[Ai  for  each  fish  in 
a  treatment 

2  X  10-16 

9  X  10-7 

4  X  10-16 

9  X  10-3 

Log  variance  of  slice  for 

each  fish  in  a  treatment 

0.19 

0.46 

0.56 

0.89 

TABLE  4 

p-Values  for  Exploratory  ANOVA  of  Area  Indices  (AI)  by  Treatment 

Without  Control 

(AI  =  [(Positive  Area)/ROI  Area]  x  100) 

Treatment:  2  Tanks  Combined 
(Small  p-Values  Indicate  Treatment  Effect) 


Sacrifice  B 

Sacrifice  D 

Data 

with 

100  DEN 

without 
100  DEN 

with 

100  DEN 

without 
100  DEN 

Mean  ■\[m  for  fish 

0 

1.9  X  10-3 

5.9  X  10-15 

7.4  X  10-3 

4.  EXPLORATORY  LINEAR  REGRESSION 

Tables  5-8  report  results  of  fitting  exploratory  linear  regression  models  to 
the  square  root  of  the  slice  area  indices  for  each  fish.  For  Tables  5  and  6,  the 
covariates  are  the  level  of  DEN  minus  its  mean;  the  level  of  TCE  minus  its  mean; 
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and  an  interaction  term:  {(level  of  DEN  minus  its  mean)  times  (level  of  TCE 
minus  its  mean)}.  The  means  were  subtracted  to  give  the  interaction  term  a  value 
other  than  0  if  the  level  of  DEN  or  the  level  of  TCE  is  0.  The  linear  regressions 
were  fit  with  and  without  the  100  DEN  treatment.  The  results  of  Table  5  are  for 
sacrifice  B  and  those  of  Table  6  are  for  sacrifice  D. 

In  both  tables  the  values  of  are  small  when  the  100  DEN  data  are  excluded. 
This  implies  that  a  linear  function  of  the  above  explanatory  variables  does  not 
explain  the  data  well.  However,  the  standard  errors  of  the  estimates  are  also 
small.  This  behavior  suggests  that,  although  there  may  be  an  association  between 
levels  of  DEN  and  TCE  and  the  square  root  of  the  area  index,  that  association  is 
not  linear.  Note  that  for  sacrifice  B  all  of  the  estimates  of  the  coefficients  are 
significantly  positive,  suggesting  that  increasing  levels  of  DEN  and  TCE  are 


TABLE  5 
Sacrifice  B;  ^jAI 

Linear  Regression  Coefficient  Estimates  with  Standard  Error 
and  95%  Normal  Confidence  Intervals 


CONSTANT 

(SE) 

[Cl] 

1 

DEN-DEN 

(SE) 

[Cl] 

WITHOUT  100. 

TCE- TCE 

(SE) 

[Cl] 

DEN 

(DEN- DEN  )x 
(TCE- TCE) 
(SE) 

[Cl] 

R2 

s.eo 

1.88 

0.037 

0.526 

0.025 

0.15 

0.50 

(0.067) 

(0.005) 

(0.148) 

(0.010) 

[1.75,2.01] 

[0.028,0.046] 

[0.237,0.816] 

[0.005,0.045] 

1  DEN  =  5.0  TCE  =  0.372 

1.77 

0.029 

WITH  100  DE 

0.710 

w 

0.039 

0.56 

0.50 

(0.040) 

(0.003) 

(0.117) 

(0.008) 

[1.69,1.85] 

[0.024,0.033] 

[0.48,0.94] 

[0.02,0.05] 

DEN  =  18.81  TCE  =  0.318  | 
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TABLE  6 
Sacrifice  D:  ^[M 

Linear  Regression  Coefficient  Estimates  with  Standard  Error 
and  95%  Normal  Confidence  Intervals 


CONSTANT 

(SE) 

[Cl] 

1 

DEN- DEN 

(SE) 

[Cl] 

WITHOUT  100 1 

TCE- TCE 

(SE) 

[Cl] 

DEN 

(DEN- DEN  )x 
(TCE- TCE) 
(SE) 

[Cl] 

R2 

s.e* 

1.89 

-0.402 

-0.025 

0.08 

0.50 

(0.07) 

(0.005) 

(0.151) 

(0.01) 

[1.76,2.03] 

[0.02,0.04] 

[-0.70,-0.11] 

[-0.05,-0.005] 

DEN  =  4.88  TCE  =  0.37 

1.68 

0.013 

WITH  100  Dl 

-0.050 

'N 

0.0005 

0.45 

0.50 

(0.04) 

(0.003) 

(0.12) 

(0.008) 

[1.60,1.76] 

[0.008,0.018] 

[-0.29,0.19] 

[-0.02,0.02] 

DEN  =  19.05  TCE  =  0.315 

associated  with  higher  area  indices.  However,  for  sacrifice  D,  the  estimate  of  the 
coefficient  of  the  level  of  TCE  and  the  estimate  of  the  coefficient  of  the  interaction 
term  coefficient  are  significantly  negative  for  the  regression,  even  without  the  100 
DEN  treatment.  This  suggests  that  for  the  later  sacrifice,  exposure  to  TCE  may 
have  an  inhibitory  effect.  The  only  estimate  of  covariate  that  is  significantly 
different  from  0  for  sacrifice  D  when  100  DEN  is  included  is  exposure  to  DEN. 

Tables  7  and  8  report  results  of  fitting  linear  regressions  with  the  covariate 
being  the  level  of  TCE  exposure  to  data  from  fish  not  exposed  to  DEN  and  to 
data  from  fish  exposed  to  10  mg/^  DEN;  the  dependent  variable  is  the  square 
root  of  the  slice  area  index.  The  values  are  very  small,  indicating  a  lack  of 
linear  fit,  but  the  estimate  coefficients  for  TCE  in  the  regressions  using  data  from 
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fish  exposed  to  10  mg/^  DEN  are  formally  significant.  Once  again,  this  behavior 
suggests  that  there  may  be  an  association  but  the  association  is  not  linear. 

For  sacrifice  B,  there  was  no  significant  effect  for  the  level  of  TCE  for  the  fish 
not  exposed  to  DEN;  for  those  fish  exposed  to  10  mg/^  DEN  the  coefficient  for 
level  of  TCE  is  significantly  positive  indicating  that  increasing  levels  of  TCE  are 
associated  with  increasing  (square  roots  of)  area  indices. 


TABLE  7 
Sacrifice  B;  ^jAl 

^ression  Coefficient  Estimates  with  Standard  Error 
and  95%  Normal  Confidence  Intervals 
(Replicate  Tanks  Pooled) 


NO  DEN 


CONSTANT 

(SE) 

[Cl] 


1.18 


(0.047) 


[1.09, 1.27] 


TCE 

(SE) 

[Cl] 


0.056 


(0.080) 


[-0.10, 0.21] 


(SE) 

[Cl] 


0.002 


So0o 


0.55 


10  DEN 


1.47 


(0.037) 


[1.40, 1.54] 


0.31 


(0.06) 


[0.18,  0.43] 


0.09 


0.44 


0  DEN  and  10  DEN 


1.14 


(0.038) 


[1.06, 1.21] 


0.18 


(0.051) 


[0.08, 0.28] 


0.038 


(0.005) 


[0.29,  0.047] 


0.14 


0.50 


In  sacrifice  D,  for  those  fish  not  exposed  to  DEN,  the  estimate  of  the 


coefficient  of  TCE  is  not  significantly  different  than  0.  However,  for  those  fish 
exposed  to  10  mg/i  DEN  the  estimate  of  the  coefficient  of  TCE  is  significantly 
negative,  suggesting  that  for  the  fish  of  the  later  sacrifice  that  were  exposed  to 
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DEN,  the  greater  the  level  of  TCE  exposure,  the  smaller  the  (square  root  of)  the 
area  index.  This  effect  calls  for  biological  explanation. 


TABLE  8 
Sacrifice  D:  ^[A^ 

Linear  Regression  Coefficient  Estimates  with  Standard  Error 
and  95%  Normal  Confidence  Intervals 
(Replicate  Tanks  Pooled) 


CONSTANT 

(SE) 

[Cl] 

NO  DEI 

TCE 

(SE) 

[Cl] 

DEN 

(SE) 

[Cl] 

R2 

s.e. 

1.35 

0.059 

- 

0.004 

0.43 

(0.037) 

(0.062) 

- 

[1.27, 1.42] 

[-0.063,  0.181] 

- 

10  DEN  1 

1.71 

-0.189 

0.02 

0.56 

(0.048) 

(0.083) 

- 

[1.61, 1.80] 

[-0.352,  -0.026] 

- 

ODEN  and  10  DEN  | 

1.39 

-0.061 

0.027 

0.07 

0.50 

(0.038) 

(0.052) 

(0.005) 

[1.32, 1.47] 

[-0.16,  0.04] 

[0.017, 0.036] 

5.  MULTIPLE  COMPARISONS 

The  exploratory  analyses  of  variances  strongly  rejected  the  null  hypothesis 
that  all  the  treatment  means  (even  without  the  100  DEN  treatment)  of  the  fish 
mean  square  root  of  the  area  indices  are  equal.  Rejection  of  the  null  hypothesis 
does  not  indicate  specifically  which  means  are  not  equal.  A  method  for 
discovering  which  means  differ  is  called  a  multiple  comparisons  procedure. 
There  are  a  number  of  different  multiple  comparisons  procedures  in  the  literature; 
see  Miller  (1981).  We  will  use  two  of  them. 
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5.1  Simultaneous  Confidence  Intervals  using  Studentized  Range  Distribution 
The  first  procedure  uses  the  studentized  range  distribution  to  construct 
simultaneous  confidence  statements  about  the  true  values  of  all  differences  of  the 
treatment  means;  this  procedure  constructs  the  Tukey  (Studentized  range 
distribution)  simultaneous  confidence  intervals.  Table  9  describes  the  procedure 
to  obtain  simultaneous  95%  confidence  intervals  for  all  differences  of  treatment 
means  for  one  sacrifice  without  the  100  DEN  treatment.  The  original  procedure 
requires  that  there  be  an  equal  number  of  fish  in  each  treatment.  However,  Ott  et 
al.  suggest  step  4  in  Table  9  if  the  number  of  fish  in  each  treatment  do  not  differ 
by  much. 

5.1a  Treatment  Means  Minus  Control  Mean 

Figure  7  presents  some  of  the  95%  simultaneous  confidence  intervals  of  the 
differences  of  treatment  means  for  sacrifice  B.  It  shows  the  confidence  intervals 
for  the  treatment  means  minus  the  control  mean  for  the  fish  mean  square  root  of 
the  area  indices.  Note  that  4  out  of  the  5  intervals  are  significantly  above  0 
indicating  that  the  treatments  are  associated  with  a  larger  mean  square  root  area 
indices  than  those  for  the  control.  The  greatest  difference  is  that  for  the  treatment 
of  10  mg/i  DEN  with  1  mg/i  TCE.  However,  since  the  confidence  intervals 
overlap,  there  is  no  apparent  association  between  the  treatment  and  the 
magnitude  of  the  differences  in  the  means. 

Figure  8  presents  some  of  the  95%  simultaneous  confidence  intervals  for  the 
later  sacrifice  D.  It  shows  the  confidence  intervals  for  the  treatment  mean  minus 
the  control  mean.  Since  the  intervals  include  0,  none  of  the  treatment  means  is 
significantly  different  from  the  control  mean. 
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TABLE  9 

To  Obtain  Tukey  (Studentized  Range  Distribution)  Simultaneous  95% 
Confidence  Intervals  for  Treatment  Mean  Differences,  One  Sacrifice 


1.  There  are  6  treatments:  Control;  10  DEN;  0.1  TCE;  10  DEN  with  0.1  TCE; 
1  TCE;  and  10  DEN  with  1  TCE. 


2.  There  are  -88  within  degrees  of  freedom 


6 

- 1)  where  tij  is  the  number 


of  fish  in  treatment  i 


3.  The  0.05  percentage  point  for  the  studentized  range  for  60  within-degrees- 
of-freedom  and  6  treatment  means  is  4.16,  from  published  tables, 
BIOMETRIKA  Tables  for  Statisticians,  Vol.  1.  This  is  larger  than  the 
percentage  point  for  88  within-degrees-of-freedom.  Thus,  the  constructed 
confidence  intervals  will  be  conservative:  one  can  truly  say  that  all  pairwise 
difference  comparisons  are  made  with  (95%)  confidence. 


4.  Since  the  number  of  fish  per  treatment  differs  somewhat  (due  to  mssing 
fish)  the  harmonic  mean  of  the  number  of  fish  per  treatment  is  used 

6 


n  =  - 


”1 


«6 


where  nj  is  the  number  of  fish  in  treatment  i. 


5.  The  mean  square  within  is 

i=lj _ 


i=l 


=  MS(  within) 


where  is  the  mean's/^  for  fish  j  in  treatment  i  and  yj.  is  the  mean  of  the 
mean  for  the  fish  in  treatment  i. 


6.  95%  confidence  intervals  for  all  pairs  of  means  14  and  fjj 

(Ft-  -  Fz'-)  ±  (4.16)^MS(within)/n 


5.1b  Treatment  Means  for  Treatments  with  Exposure  to  10  mg//  DEN  Minus 
Those  without  Exposure  to  10  mg//  DEN 

Figure  9  displays  the  sacrifice  B  95%  simultaneous  confidence  intervals  for  the 

difference  between  the  treatment  means  for  mean  fish  square  root  of  the  area 
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indices  for  those  treatments  with  10  mg/£  DEN  minus  the  treatment  means  for 
those  treatments  without  10  mg/£  DEN,  by  level  of  TCE  exposure.  Note  that  the 
treatment  means  with  10  mg/ i  DEN  is  significantly  larger  than  that  without 
lOmg/^  DEN  for  0  mg/ £  TCE  and  1  mg/£  TCE  since  the  confidence  intervals  do 
not  include  0.  There  is  no  significant  difference  for  0.1  mg/£  TCE  since  the 
confidence  interval  includes  0. 

Figure  10  displays  a  similar  plot  for  the  later  sacrifice  D.  There  is  no 
significant  difference  between  the  treatment  means  with  10  mg/£  DEN  and  those 
without. 

5.1c  Treatment  Means  for  Sacrifice  D  Minus  Treatment  Means  for  Sacrifice  B 

Simultaneous  confidence  intervals  are  computed  for  all  differences  of  the 
treatment  means  of  the  fish  mean  square  root  of  the  area  indices  for  sacrifices  B 
and  D  combined.  Figure  11  displays  six  of  the  95%  simultaneous  confidence 
intervals.  It  displays  the  95%  confidence  intervals  for  the  difference  in  treatment 
means  between  sacrifice  D  and  sacrifice  B.  The  only  significant  difference  is  for 
the  control  where  the  mean  of  the  fish  mean  square  root  of  area  indices  for 
sacrifice  D  is  significantly  larger  than  that  for  sacrifice  B. 

5.2  Studentized  Maximum  Modulus  Confidence  Intervals 

Simultaneous  confidence  intervals  for  the  treatment  means  themselves  can  be 
constructed  using  the  studentized  maximum  modulus  procedure;  cf.  Miller 
(1981).  The  procedure  is  as  follows  for  the  mean  fish  square  root  area  index  for 
the  three  treatment  groups  (10  mg/£  DEN,  0  mg/£  TCE),  (10  mg/£  DEN, 
0.1  mg/ £  TCE),  and  (10  mg/£  DEN,  1  mg/£  TCE)  for  1  sacrifice. 
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To  Obtain  Simultaneous  Confidence  Intervals  for  3  Treatment  Means  using  the 
Studentized  Maximum  Modulus  Distribution 

1 .  Compute  the  within  degrees  of  freedom  for  the  three  treatments. 

i=l 

where  n/  is  the  number  of  fish  in  treatment  i. 

2.  Compute  the  mean  square  within 

XZ(yzy-yf-f 

MS(  within)  =  — - 

f=l 

where  yij  is  the  mean  square  root  of  the  area  indices  for  fish  ;  in  treatment  i 
and  yj.  is  the  mean  of  the  fish  means  in  treatment  i. 

3.  Find  the  upper  0.05  point  of  the  studentized  maximum  modulus  distribution 
with  parameters  3  (treatments)  and  d  degrees  of  freedom,  m(3,  d).  Tables  can 
be  found  in  Miller  (1981). 

4.  The  three  simultaneous  95%  confidence  intervals  are 

yi.  ±  m{3,d)^MS{withm)/ni . 

Figure  12  displays  the  95%  simultaneous  confidence  intervals  for  sacrifice  B 
for  the  means  of  the  fish  mean  of  the  square  root  of  the  slice  area  indices  for  those 
treatments  having  fish  exposed  to  10  mg/£  DEN.  The  means  appear  about  the 
same  for  0  mg/£  TCE  and  0.1  mg/£  TCE.  The  mean  for  1  mg/£  TCE  appears  to  be 
somewhat  larger. 
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Figure  13  displays  the  95%  simultaneous  confidence  intervals  for  sacrifice  D 
for  the  means  of  those  treatments  with  10  mg/^  DEN  by  level  of  TCE.  There  is 
some  suggestion  that  the  presence  of  TCE  is  associated  with  a  lower  mean  of  the 
fish  mean  square  root  of  the  slice  area  indices. 


The  above  analyses  illustrate  the  use  of  statistical  methods  appropriate  for  the 
kinds  of  data  obtained  by  the  medaka  experiments.  The  methods  of 
transformation,  analysis  of  variance,  and  multiple  comparisons  are  useful  and 
powerful  for  the  initial  data  analyses,  suggesting  some  surprising  dose-response 
relations  that  are  worthy  of  careful  further  biological  investigation  and 
explanation.  Alternative  methods  can  also  be  applied,  and  should  yield  the  same 
general  insights. 
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An  Analysis  of  Female  Breast  Tissue  Data 
In  Order  to  Predict  Cancer 


by 

D.  P.  Gaver  and  P.  A.  Jacobs 

Operations  Research  Department 
Naval  Postgraduate  School 
Monterey,  CA  93943 


1.  Introduction 

In  October  1994,  Dr.  D.  Malins  sent  us  data  dated  10/25/94  from  a  study  of 
female  breast  tissues.  The  data  are  measurements  from  breast  tissue  samples 
from  30  female  patients.  Fifteen  of  the  patients  underwent  reduction 
mammoplasty;  tissues  from  these  patients  are  considered  to  be  normal.  The  other 
15  patients  had  invasive  ductal  carcinoma;  some  of  the  samples  from  these 
patients  are  from  breast  tumors;  other  samples  are  from  microscopically  normal 
tissue  from  the  cancer  patients.  The  study  used  multiple  breast  tissue  samples 
from  some  patients. 

The  data  are  measurements  of  hydrox5anethyluracil  (HMUra),  fapyadenine 
(Fapy-A),  8-hydroxyadenine  (8-OH-A),  fapyguanine  (Fapy-G),  and 
8-hydroxyguarune  (8-OH-G)  from  the  breast  tissue  samples. 

In  this  study  we  restrict  our  attention  to  the  samples  from  the  women  who 
underwent  reduction  mammoplasty  (normal)  and  tissues  from  invasive  ductal 
carcinoma  tumors  (cancer).  There  are  68  samples  from  women  who  underwent 
reduction  mammoplasty  and  10  samples  from  invasive  ductal  tumors  for  a  total 
of  78  samples. 
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We  are  interested  in  the  ability  of  the  covariates  Fapy-A,  8-OH-A,  Fapy-G, 
and  8-OH-G  to  predict  occurrence  of  cancerous /normal  tissue.  The  data  sample 
type  was  recoded  to  be  equal  to  0  for  a  sample  from  a  reduction  mammoplasty 
and  1  for  a  sample  from  an  invasive  ductal  tumor.  Logistic  regression  models  (cf. 
Collett  (1991))  are  used  to  describe  and  predict  data. 

One  way  to  evaluate  the  usefulness  of  a  statistical  model  is  to  evaluate  how 
well  it  describes  the  data  used  to  estimate  the  model  (goodness-of-fit).  Another 
way  is  to  evaluate  how  well  the  statistical  model  predicts  new  data  that  was  not 
used  to  estimate  its  parameters.  This  latter  process  is  called  cross  validation;  it  is  a 
natural  and  well-accepted  procedure  for  assessing  the  quality  of  a  proposed 
prediction  methodology.  Mosteller  and  Tukey  (1977)  give  a  good  discussion. 

In  this  paper,  simulation  is  used  to  evaluate  logistic  regression  models  with 
different  numbers  of  covariates.  In  each  simulation  replication,  the  data  are 
randomly  allocated  to  one  of  two  data  sets.  One  data  set  is  then  used  to  estimate 
the  parameters.  The  estimated  model  is  then  used  to  predict  the  probability  that  a 
data  point  from  the  other  data  set  is  from  a  cancerous  tissue. 

Summary 

A  summary  of  the  results  is  as  follows.  Of  the  logistic  regression  models 
considered,  the  logistic  regression  model  with  the  best  goodness-of-fit  is,  not 
surprisingly,  the  one  with  the  largest  number  of  covariates:  constant, 
log  (Fapy-A),  log  (8-OH-A),  log  (Fapy-G),  and  log  (8-OH-G).  However,  this 
model  tends  to  be  the  weakest  predictor.  This  model  can  be  viewed  as  overfitting 
the  data.  There  are  two  logistic  regression  models  that  are  better  predictors: 
regression  Model  I  has  these  covariates:  constant,  log  (Fapy-A),  and 
log  (8-OH-A);  the  other  logistic  regression  model  (H)  has  these  covariates: 
constant  and  log  (Fapy-A/8-OH-A).  On  the  basis  of  the  cross-validated 
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procedure  described,  regression  Model  II  tends  to  predict  the  occurrence  of 
normal  samples  somewhat  better  than  Model  I  does.  However,  Model  I  predicts 
cancer  samples  somewhat  better  than  Model  11.  What  this  means  is  that  Model  I 
tends  to  give  more  false  positives  than  Model  II,  whereas  Model  11  tends  to  give 
more  false  negatives  than  Model  I.  Model  I  describes  the  data  it  was  fit  to 
somewhat  better  than  Model  II;  this  is  not  surprising  since  it  involves  one  more 
parameter  than  does  Model  11.  All  the  logistic  regression  models  considered  had 
more  difficulty  in  predicting  the  cancer  samples  than  they  did  with  normal 
samples:  on  the  average  about  1  out  of  5  cancer  samples  was  incorrectly 
predicted. 

Section  2  describes  the  logistic  regression  model.  Section  3  describes  a 
simulation  experiment  which  results  in  classifications  and  Section  4  reports 
results  from  that  simulation.  Sections  5  and  6  describe  additional  simulation 
experiments  and  present  their  results. 


2.  The  Logistic  Regression  Model 

Suppose  we  have  responses  which  can  either  be  0  (from  normal  tissue)  or  1 
(from  cancer  tumor)  and  let  pi  be  the  probability  that  the  z^h  response  is 
cancerous.  The  logistic  regression  model  for  the  dependence  of  pi  on  the  values  of  k 
exploratory  variables  Xi\,  xn,  ■  ■ .,  Xjjt  associated  with  the  z^^  response  is 


logit{pi)  =  \og{pi/{l-pi)) 


=  Po+  A  +  +  •  •  •  +  ^ikPk- 


(2.1) 


After  rearrangement 


exp{  A)  +  +  Pixn  +  •  •  •  +  Pk^ik  } 

1  +  exp{^o  +  +  P2Xi2  +  •  •  •  +  Pk^k] 


(2.2) 
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In  what  follows  we  use  the  procedure  in  the  standard  statistical  package  S-PLUS 
to  estimate  the  parameters  of  the  logistic  regression  models. 

Note  that  the  data  analysis  conducted  to  date  has  been  confined  to  use  of  the 
logistic  model  exclusively,  and  to  expression  of  the  covariates  or  explanatory 
variables  in  terms  of  the  logarithms  of  their  concentrations.  Other  options  for  the 
link  function  (connection  between  response  probability  and  covariates,  here  the 
logistic)  and  covariate  representation  could  profitably  be  investigated. 

It  is  strongly  recommended  that  development  of  a  truly  mechanistic  model 
that  represents  biological  process  be  conducted,  and  the  latter  used  for 
prediction.  Work  in  this  direction  has  been  initiated  with  Dr.  J.  Burkhart  of 
NIEHS,  and  should  extend  well  to  the  present  problem. 

3.  A  Simulation  Experiment  Resulting  in  Classifications 

Each  simulation  experiment  consists  of  100  replications.  In  what  follows  an 
observation  consists  of  type  of  sample  (0  =  normal,  1  =  cancerous)  and  the 
corresponding  covariates. 

Each  replication  consists  of  the  following  operations.  Each  observation  or  data 
point  is  randomly  allocated  to  the  data  set  used  to  fit  the  statistical  model  (FD)  or 
to  the  data  set  which  the  fitted  model  is  used  to  predict  (PD).  If  the  data  set  used 
to  fit  the  model  contains  no  cancer  samples  the  randomization  is  done  again. 
Thus,  on  the  average,  1  /  2  of  the  observations  are  used  to  fit  the  statistical  model 
and  1/2  of  the  observations  are  reserved  to  assess  the  predictive  ability  of  the 
fitted  model.  Some  FD  data  sets  contain  as  few  as  1  cancer  sample,  so  the 
prediction  assessments  may  be  conservative.  Other  sampling  procedures  are 
described  in  Section  5. 

One  or  more  logistic  regression  models  are  estimated  using  the  data  set  FD. 
The  parameter  estimates  are  then  used  in  (2.2)  to  evaluate  the  predicted 
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probability  of  the  sample  being  cancerous  for  each  observation  in  the  data  set  PD. 
In  order  to  make  a  Yes-No  statement  a  probability  cut  point  must  be  chosen;  this 
is  actually  a  parameter.  For  this  study  we  choose  p  =  0.5:  if  the  predicted 
probability  is  less  than  or  equal  to  0.5,  a  classification  of  0  is  assigned  to  the 
observation;  if  the  predicted  probability  is  greater  than  0.5,  a  classification  of  1  is 
assigned  to  the  observation. 

In  order  to  assess  classification  accuracy  the  absolute  difference  between  the 
classification  and  the  sample  type  (recoded  to  be  0  for  normal  and  1  for 
cancerous)  is  computed  for  each  observation  in  PD.  The  mean  of  the  absolute 
difference  is  computed  for  the  normal  observations.  The  mean  of  the  absolute 
differences  is  computed  for  the  cancer  observations.  Note  that  perfect  prediction 
classification  would  result  in  a  mean  of  0.  If  all  the  predictions  are  wrong,  then 
the  mean  will  be  1.  Thus,  the  mean  measures  the  fraction  of  the  predictions  that 
are  in  error. 

The  goodness  of  fit  is  also  evaluated.  The  procedure  is  as  follows.  The  fitted 
probability  of  being  cancerous  is  evaluated  for  each  data  point  in  FD.  If  the  fitted 
probability  is  less  than  or  equal  to  0.5,  a  classification  of  0  is  assigned  to  the 
observation;  if  the  fitted  probability  is  greater  than  0.5,  then  a  classification  of  1  is 
assigned  to  the  observation.  The  absolute  difference  between  the  classification 
and  the  sample  type  (recoded  to  be  0  for  normal  and  1  for  cancerous)  is 
computed  for  each  observation  in  FD.  The  mean  of  the  absolute  difference  is 
computed  for  the  normal  observations.  The  mean  of  the  absolute  differences  is 
computed  for  the  cancer  observations. 

Note  that  a  perfect  fit  classification  would  result  in  a  mean  of  0.  If  all  the  fit 
classifications  were  wrong,  then  the  mean  would  be  1.  Thus,  the  mean  represents 
the  fraction  of  fit  classifications  that  are  in  error. 
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4.  The  Classification  Simulation  Results 

The  simulation  results  for  the  following  logistic  regression  models  are 
reported.  All  logistic  regressions  have  a  constant  term. 

Model  Covariates 

I:  log  (Fapy-A),  log  (8-OH-A) 

n:  log  (Fapy-A/8-OH-A) 

m:  log  (Fapy-A/8-OH-A),  log  ((Fapy-A)  x  (8-OH-A)) 

IV:  log  (Fapy-A),  log  (8-OH-A),  log  (Fapy-G),  log  (8-OH-G) 

V;  log  (Fapy-A/8-OH-A),  log  (Fapy-G/8-OH-G) 

All  of  these  models  were  estimated  using  the  same  sets  of  data  and  used  to 
predict  the  same  sets  of  data.  Note  that  Model  I  and  Model  HI  are  equivalent  but 
the  parameterization  of  Model  IQ  may  be  more  biologically  meaningful. 

Table  1  reports  the  mean  of  the  mean  fit-classification  errors.  Model  IV  with 
the  greatest  number  of  covariates  (not  surprisingly)  gives  the  best  fit. 

Table  2  reports  the  mean  of  the  mean  predicted-classification  errors.  Note  that 
the  5  fitted  logistic  regression  models  are  all  predicting  the  same  sets  of  data  and 
were  all  estimated  using  the  same  sets  of  data.  Thus,  the  means  are  comparable. 
The  best  fitting  Model  IV  predicts  less  well  than  Models  I,  n  and  HI.  Model  II 
appears  to  categorize  the  normal  samples  the  best  but  does  not  categorize  the 
cancer  samples  as  well  as  Models  I  (and  m).  Thus,  Model  I  (and  m)  tends  to  give 
more  false  positives  than  Model  n.  However,  Model  11  tends  to  give  more  false 
negatives  than  Model  I  (and  III).  Since  there  are  on  the  average  5  cancer 
observations  to  be  predicted,  the  results  suggest  that  on  the  average  one  or  two 
of  the  cancer  observations  are  misclassified  as  normal  for  all  of  the  models.  Since 
there  are  on  the  average  34  normal  observations  to  be  predicted,  the  results 
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suggest  that  Models  I  -  III  misclassify  about  one  normal  observation  as  cancerous 
(1/34  =  0.029). 

5.  A  Balanced  Simulation  Experiment 

Each  simulation  experiment  consists  of  100  replications.  Each  replication 
consists  of  the  following  operations. 

34  of  the  68  normal  observations,  and  5  of  the  10  cancer  observations,  are 
chosen  randomly  without  replacement  to  be  in  the  data  set  ED.  The  other  normal 
and  cancer  observations  are  put  in  data  set  PD. 

The  parameters  of  the  logistic  regression  models  are  estimated  using  the  data 
in  ED.  The  fitted  probability  of  the  observation  being  cancerous  is  computed 
using  (2.2)  and  the  parameter  estimates  for  each  observation  in  ED.  The  mean 
absolute  difference  between  the  fitted  probability  and  the  sample  type  (0  = 
normal,  1  =  cancerous)  is  computed  for  all  normal  observations  (respectively  all 
cancer  observations). 

The  estimated  parameters  are  then  used  in  (2.2)  to  evaluate  the  predicted 
probability  of  the  sample  being  cancerous  for  each  observation  in  the  data  set  PD. 
The  mean  absolute  difference  between  the  predicted  probability  and  the  sample 
type  is  computed  for  all  normal  observations  (respectively  all  cancer 
observations). 

Once  again  perfect  fit  (respectively  prediction)  would  result  in  a  mean  of  0.  A 
poor  fit  (respectively  prediction)  will  have  a  mean  closer  to  1. 

6.  Results  of  Simulation  Experiments 

Tables  3  and  4  report  the  results  of  simulation  experiments  comparing  the 
logistic  regression  models  with  the  following  covariates. 
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Model  Covariates 

I:  Constant,  log  (Fapy-A),  log  (8-OH-A) 

11:  Constant,  log  (Fapy-A /8-OH-A) 

ni:  Constant,  log  (Fapy-A/ 8-OH-A),  log  ((Fapy-A)  x  (8-OH-A)) 

IV:  Constant,  log  (Fapy-A),  log  (8-OH-A),  log  (Fapy-G),  log  (8-OH-G) 

V:  Constant,  log  (Fapy-A /8-OH-A),  log  (Fapy-G /8-OH-G) 

Tables  5  and  6  report  results  of  simulation  experiments  comparing  the  logistic 
regression  models  with  the  following  covariates. 

Model  Covariates 

la:  Constant,  log  (Fapy-A) 

Ha:  Constant,  log  (8-OH-A) 

nia:  Constant,  log  (Fapy-A /8-OH-A) 

rVa:  Constant,  log  (Fapy-A),  log  (8-OH-A) 

Va:  Constant,  log  (Fapy-A/8-OH-A),  log  (Fapy-G /8-OH-G) 

Once  again  Model  IV  offers  the  best  fit.  However,  it  tends  to  over-predict  the 
presence  of  cancer  in  normal  tissue  and  under-predict  the  presence  of  cancer  in 
cancer  tissue.  Model  I  (IVa)  with  covariates:  constant,  log  (Fapy-A)  and  log 
(8-OH-A)  (not  surprisingly)  has  smaller  mean  absolute  fitted  error  than  Model  II 
(nia)  with  covariates:  constant  and  log  (Fapy-A/8-OH-A).  In  terms  of  prediction, 
the  mean  of  the  mean  prediction  errors  for  normal  samples  for  Model  I  (FVa)  is 
greater  than  or  equal  to  that  for  Model  n  (nia).  However,  the  mean  of  the  mean 
prediction  errors  for  the  cancer  samples  for  Model  H  (Ilia)  is  greater  than  that  for 
Model  I  (IVa). 
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7.  Conclusions 

In  this  paper  we  evaluated  the  usefulness  of  logistic  regression  models  in  two 
ways,  goodness-of-fit  and  ability  to  predict.  The  best  fitting  logistic  regression 
model  did  not  predict  new  observations  as  well  as  some  of  the  other  models. 
Two  better  predicting  logistic  regression  models  are 
Model  Covariates 

I:  Constant,  log  (Fapy-A),  log  (8-OH-A) 

n:  Constant,  log  (Fapy-A /8-OH-A) 

Both  of  these  models,  used  as  described  above,  tend  to  give  false  negatives  in 
about  1  out  of  5  cancer  samples.  Both  tend  to  give  false  positives  in  about  1  out  of 
34  normal  samples. 
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Table  1 
Fitted  Values 

Mean  of  Mean  Absolute  Fitted  Qassification  Error 


Normal  Sample 

Cancer  Sample 

Model 

Mean  of  Replic.  Means 

Mean  of  Replic.  Means 

(VVar.  of  Replic.  Means) 

(VVar.  of  Replic.  Means) 

I 

0.011 

0.159 

(0.015) 

(0.145) 

n 

0.186 

(0.016) 

(0.143) 

m 

0.011 

0.159 

(0.015) 

(0.145) 

rv 

0.000 

0.000 

(0.000) 

(0.000) 

V 

0.016 

0.165 

(0.019) 

(0.147) 

Table  2 
Predictions 

Mean  of  Mean  Absolute  Prediction  Classification  Error 


Normal  Sample 

Cancer  Sample 

Model 

Mean  of  Replic.  Means 

Mean  of  Replic.  Means 

(VVar.  of  Replic.  Means) 

(VVar.  of  Replic.  Means) 

I 

0.027 

0.231 

(0.031) 

(0.226) 

n 

0.025 

0.243 

(0.026) 

(0.228) 

m 

0.027 

0.231 

(0.031) 

(0.226) 

rv 

0.265 

(0.045) 

(0.241) 

v 

0.030 

0.267 

(0.034) 

(0.249) 

Model  Covariates 

I:  Constant,  log  (Fapy-A),  log  (8-OH-A) 

n:  Constant,  log  (Fapy-A/8-OH-A) 

ni:  Constant,  log  (Fapy-A /8-OH-A),  log  ((Fapy-A)  x  (8-OH-A)) 

IV:  Constant,  log  (Fapy-A),  log  (8-OH-A),  log  (Fapy-G),  log  (8-OH-G) 

V:  Constant,  log  (Fapy-A /8-OH-A),  log  (Fapy-G/8-OH-G) 
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Table  3 
Fitted  Values 

Mean  of  Mean  Absolute  Fitted  Error 


Model 

Normal  Sample 

Mean  of  Replic.  Means 

(VVar.  of  Replic.  Means) 

Cancer  Sample 

Mean  of  Replic.  Means 

(VVar.  of  Replic.  Means) 

I 

0.0252 

0.171 

(0.0203) 

(0.138) 

n 

0.0288 

0.196 

(0.0196) 

(0.134) 

m 

0.0252 

0.171 

(0.0203) 

(0.138) 

IV 

0.000 

0.000 

(0.000) 

(0.000) 

V 

0.0271 

0.184 

(0.0205) 

(0.140) 

Table  4 
Predictions 

Mean  of  Mean  Absolute  Prediction  Error 


Model 

Normal  Sample 

Mean  of  Replic.  Means 

(V  Var.  of  Replic.  Means) 

Cancer  Sample 

Mean  of  Replic.  Means 

(VVar.  of  Replic.  Means) 

I 

0.0548 

0.202 

(0.0448) 

(0.138) 

II 

0.0486 

0.234 

(0.0329) 

(0.164) 

IE 

0.0548 

0.202 

(0.0479) 

(0.138) 

IV 

0.0679 

0.209 

(0.0519) 

(0.183) 

V 

0.0566 

0.235 

(0.0430) 

(0.180) 

Model  Covariates 

I:  Constant,  log  (Fapy-A),  log  (8-OH-A) 

II:  Constant,  log  (Fapy-A/ 8-OH-A) 

IE:  Constant,  log  (Fapy-A /8-OH-A),  log  ((Fapy-A)  x  (8-OH-A)) 

IV:  Constant,  log  (Fapy-A),  log  (8-OH-A),  log  (Fapy-G),  log  (8-OH-G) 

V:  Constant,  log  (Fapy-A/8-OH-A),  log  (Fapy-G/ 8-OH-G) 
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Table  5 
Fitted  Values 

Mean  of  Mean  Absolute  Fitted  Error 


Normal  Sample 

Cancer  Sample 

Model 

Mean  of  Replic.  Means 

Mean  of  Replic.  Means 

(V  Var.  of  Replic.  Means) 

(V  Var.  of  Replic.  Means) 

la 

0.057 

0.391 

(0.021) 

(0.145) 

Ha 

0.118 

0.804 

(0.009) 

(0.063) 

nia 

0.030 

0.202 

(0.019) 

(0.128) 

IVa 

0.026 

0.176 

(0.020) 

(0.137) 

Va 

0.028 

0.190 

(0.020) 

(0.137) 

Table  6 
Predictions 

Mean  of  Mean  Absolute  Prediction  Error 


Model 

Normal  Sample 

Mean  of  Replic.  Means 

(V  Var.  of  Replic.  Means) 

Cancer  Sample 

Mean  of  Replic.  Means 

(a/ Var.  of  Replic.  Means) 

la 

0.059 

0.442 

(0.030) 

(0.117) 

Ha 

0.127 

0.841 

(0.023) 

(0.042) 

ma 

0.04001  ' 

0.256 

(0.027) 

(0.163) 

IVa 

0.04002 

0.225 

(0.028) 

(0.125) 

Va 

0.045 

0.273 

(0.033) 

(0.192) 

Model  Covariates 

la:  Constant,  log  (Fapy-A) 

Ha:  Constant,  log  (8-OH-A) 

nia:  Constant,  log  (Fapy-A /8-OH-A) 

IVa:  Constant,  log  (Fapy-A),  log  (8-OH-A) 

Va:  Constant,  log  (Fapy-A /8-OH-A),  log  (Fapy-G/8-OH-G) 
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APPENDIX  6 


Remarks  Resulting  From  a  Preliminary  Examination  of 
Data  Sent  by  Dr.  L.  Twerdok  of  GEO-CENTERS,  INC. 
at  U.S.  Army  BRDL  in  October  1994 

by 

P.  A.  Jacobs  and  D.  P.  Gaver 

Department  of  Operations  Research 
Naval  Postgraduate  School 
Monterey,  CA  93943 


1.  Introduction 

The  data  consist  of  measurements  made  on  medaka  that  were  sacrificed  at 
different  times. 

The  information  recorded  for  each  fish  include;  the  date  of  the  experiment 
(which  is  called  the  sacrifice  data  here);  the  age  (in  months);  the  length  in 
millimeters;  the  weight  in  milligrams;  percent  hematocrit;  percent  leukocrit;  and 
the  hatch  date.  The  minimum  recorded  value  of  leukocrit  is  0.01;  this  value  is  a 
code  for  "below  the  limit  of  detection". 

There  are  missing  values  which  are  coded  by  the  value  100.  There  are  2 
suspect  weights;  one  of  1079  mg  for  one  of  the  fish  sacrificed  on  3/16/92  and  one 
equal  to  33  mg  for  one  of  the  fish  sacrificed  on  7/25/94. 

2.  Graphical  Displays 

Only  the  data  without  any  missing  values  are  considered  for  this  preliminary 
examination  of  the  data.  Since  all  of  the  remaining  sacrifice  dates  are  in  1994,  in 
Figures  1-4  the  sacrifice  dates  are  coded  as  follows:  7/25/94  is  coded  as  725; 
10/05/94  is  coded  as  1005,  etc. 

2.1  Boxplots  by  Hatch  Date 

Figure  1  displays  boxplots  of  log  (leukocrit)  as  a  function  of  the  sacrifice  date; 
the  leukocrit  values  of  0.01  are  not  considered.  There  is  one  figure  for  each  hatch 
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date.  A  remarkable  feature  is  the  apparent  decline  in  the  log  (leukocrit  value)  for 
later  sacrifice  dates  for  the  1/27/94  hatch  date. 

Figure  2  displays  boxplots  of  log  (hematocrit)  as  a  function  of  sacrifice  date. 
There  is  one  figure  for  each  hatch  date.  There  does  not  appear  to  be  as  much 
systematic  variability  of  log  (hematocrit)  between  sacrifice  times  as  there  is  for 
log  (leukocrit);  however  note  that  the  scales  are  different. 

Figure  3  (respectively  Figure  4)  displays  boxplots  of  length  (respectively 
weight)  by  sacrifice  date.  There  is  one  figure  for  each  hatch  date.  There  are 
sometimes  apparent  declines  in  length  and  weight  for  later  sacrifice  times. 

Figures  5-8  display  boxplots  of  measurements  by  age  of  the  sacrificed  fish  for 
each  hatch  date.  There  is  one  figure  for  each  hatch  date.  Figure  5  displays 
boxplots  for  log  (leukocrit)  without  the  0.01  values.  There  is  an  apparent  decrease 
in  log  (leukocrit)  numerical  value  with  age.  Figure  6  displays  boxplots  of  log 
(hematocrit).  Figure  7  (respectively  Figure  8)  displays  boxplots  of  length 
(respectively  weight)  by  age  for  each  hatch  date.  Once  again  there  is  sometimes 
the  (surprising?)  apparent  decrease  in  length  and  weight  numerical  values  with 
age. 

Summary.  There  is  the  suggestion  that  the  measurements  are  associated  with 
sacrifice  date  beyond  what  is  expected  to  be  age  dependence.  The  leukocrit 
measurement  seems  to  be  particularly  variable  across  sacrifices.  This  may 
suggest  the  presence  of  a  "sacrifice  effect"  that  could  affect  the  problem  of 
treatment  comparisons  across  sacrifices.  Note  that  no  formal  statistical  analysis 
has  yet  been  conducted  to  quantify  the  strength  of  the  effect. 

2.2  Associations  By  Age 

Figure  9  is  a  display  of  scatterplots  of  log  (weight)  versus  log  (length)  by  age 
of  fish  at  time  of  sacrifice;  there  is  one  plot  for  each  fish  age.  Not  surprisingly, 
there  appears  to  be  an  association. 
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Figure  10  is  a  display  of  scatterplots  of  log  (hematocrit)  versus  log  (leukocrit) 
by  age  of  fish  at  time  of  sacrifice;  the  leukocrit  values  of  0.01  are  omitted.  There  is 
little  consistent  association  across  ages,  although  there  is  a  hint  that  log 
(leukocrit)  and  log  (hematocrit)  increase  together  at  ages  3  and  5,  are  negatively 
or  inversely  related  at  ages  6-8,  and  become  essentially  unrelated  after  age  9. 

Figure  11  is  a  display  of  boxplots  of  log  (hematocrit)  by  age.  An  analysis  of 
variance  does  not  reject  the  null  hypothesis  that  the  means  are  all  equal; 
(p  =  0.89). 

Figme  12  is  a  display  of  log  (leukocrit)  by  age  along  with  an  estimated  least 
squares  straight  line;  those  values  of  leukocrit  =  0.01  have  been  omitted  from 
consideration.  The  estimated  least  squares  linear  line  is 

log  leukocrit  =  0.89  -  0.17  (age)  (2.1) 

(s.e.)  (0.16)  (0.02) 

The  estimated  equation  is  displayed  along  with  estimated  standard  errors  for  the 
parameter  values  in  parentheses.  Since  the  standard  error  of  the  slope  is  less  than 
2  times  the  absolute  value  of  the  estimate  of  the  slope,  there  is  an  apparent 
decrease  in  log  (leukocrit)  with  age.  However,  recall  that  the  level  of  leukocrit 
also  appears  to  be  associated  with  sacrifice  date.  An  age  grouping  may  contain 
several  sacrifice  dates.  If  there  is  a  "sacrifice  effect"  this  would  tend  to  dilute  the 
strength  of  the  relationship  explored. 

Figure  13  displays  boxplots  of  the  residuals  from  the  least  squares  regression 
by  sacrifice  data.  An  analysis  of  variance  rejects  the  null  hypothesis  that  the 
residual  means  are  equal  for  the  different  sacrifice  dates.  This  is  in  agreement 
with  the  low  R'^  discovered  in  the  regression  of  Figure  12,  and  tends  to  support 
the  hypothesis  that  there  may  be  a  "sacrifice  effect". 

The  following  least  squares  regression  model  was  estimated  (omitting  the 
leukocrit  values  of  0.01) 
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log  (leukocrit)  =  &o  +  &i  (age)  +  &2  (length) 

The  =  0.31  with  residual  standard  error  =  0.60.  The  parameter  estimates  (with 

estimated  standard  errors)  are 

bo  =  2.7  &i  =  -0.14  b2  =  -0.07 
(0.44)  (0.02)  (0.02) 

An  analysis  of  variance  of  the  residuals  from  the  regression  versus  sacrifice  date 
rejects  the  null  hypothesis  of  equal  means  across  sacrifice  date  (F  =  7.3  with 
between-df  =  15  and  within-df  =  198,  p  =  2  x  10-12).  Again  there  seems  to  be  a 
source  of  variability  associated  with  sacrifices. 

Figure  14  displays  boxplots  of  the  residuals  of  the  regression  versus  sacrifice 
date.  Note  the  apparent  decline  in  the  residuals  for  the  last  cluster  of  experiments 
conducted  from  10/3/94  through  10/18/94. 

3.  Preliminary  Conclusions 

There  does  not  appear  to  be  an  association  between  age  and  log  (hematocrit) 
level.  A  least  squares  linear  regression  (2.1)  suggests  that  there  may  be  an 
association  between  age  and  leukocrit.  However,  analysis  of  variance  of  the 
residuals  from  the  least  squares  regression  suggests  an  association  between 
leukocrit  and  the  sacrifice  date.  This  association  may  be  due  to  the  procedure 
used  to  measure  leukocrit;  it  may  also  be  due  to  differences  in  water  quality  and 
other  physical  factors  in  the  experiment  and  the  health  of  the  animals.  If  there  are 
sacrifice  effects  they  will  tend  to  dilute  the  strength  of  associations  between  other 
measured  variables  such  as  age,  hematocrit,  leukocrit,  length,  weight,  etc.  Since 
we  have  been  unable  to  discover  a  biological  reason  for  a  relationship  between 
leukocrit  value  and  age  in  adult  animals,  we  suspect  that  age  is  a  surrogate  for 
some  other  effect. 
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FIGURE  FOR  EACH  HATCH  DATE;  1  BOX  PLOT  FOR  EACH  AGE 
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MATHEMATICAL  MODELING  OF  FREE  RADICAL  (o^) 
PRODUCTION  STIMULATION  BY  PMA 


D,  P.  Gaver 
P.  A,  Jacobs 


1.  Purpose. 

The  objective  of  this  note  is  to  provide  a  preliminary  simple  mathematical 
model  to  describe,  in  a  quantitative  way,  the  behavior  of  data  exhibited  by 
Dr.  Judith  Zelikoff  at  a  recent  USABRDL  Research  Project  Review  (Sept.  20-21, 
1994).  The  data,  shown  in  graphical  form,  appeared  qualitatively  as  follows: 


Observed 
Concentration 
of  Radical 


The  radical  production  was  stimulated  by  the  introduction  of  an  initial  dose  of 
PMA  at  time  zero;  the  resulting  production  was  made  visible  or  observable  by 
surrounding  the  cells  so  exposed  with  a  solution  of  Luminal. 

It  was  shown  experimentally  that  quantitative  features  of  the  above  dose 
response  curve  changed  with  experimental  conditions  such  as  the  PMA  dose 
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(presumably  measured  by  its  concentration  and  the  amount  of  solution 
introduced,  plus  other  experimental  features  and  conditions). 


2.  Simplest  Mathematical  Model. 

Denote  the  various  quantities  present  as  follows: 

P(t)  =  amount  of  PMA  (or  perhaps  another  substance,  such  as  a 
toxicant  or  chemical)  present  at  time  t  that  has  not  yet  interacted 
with  a  cell; 

R{t)  =  amount  of  radical,  O^,  present,  the  appearance  of  which  has 

been  caused  by  PMA; 

C  =  constant  cell  population  size. 

Here  is  a  flow  diagram  for  the  interaction  envisioned: 


FIGURE  2 


It  is  understood  that  when  the  radical  is  formed  it  becomes  (bio)luminescent  in 
the  presence  of  luminal;  this  material  is  present  at  all  times  and  does  not  change 
in  concentration.  The  luminescence  is  assumed  to  be  proportional  to  Rit)  (this 
may  not  be  strictly  correct,  J.Z.  to  modify  any  of  the  above). 

First  Model. 

If  it  is  assumed  that  the  cell  population  is  large,  and  essentially  unchanged  by 
interaction  with  PMA,  then  the  latter,  which  is  introduced  in  amount  P(0)  at  f  =  0, 
reduces  as  follows  in  time  (f,  t  +  A): 

Pit  +  A)=P(t)-XP{t)CA.  (1) 
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Passing  to  the  limit  as  A  ->  0  this  differential  equation  results 


which  gives 


dt 


:-ACP(f) 


(2) 


Pit)  =  PiO)e-^Ct^ 


(3) 


Next  consider  radical  production  and  "death",  i.e.  the  presumed  cessation  of 
radical  production  by  a  cell  after  initial  stimulation. 


R{t  +  A)  =  R{t)  +  XP{t)C  A  -  ^(f)  A  .  (4) 

Radical  Radical  Reduction 
Production  in  time;  "death" 


This  gives 


^  =  AP(0C-/tR(f) 


=  ACP{0)e~^*  -  HR{t). 
This  has  the  solution  (by  standard  integrating  factors) 


(5) 


(6) 


Very  possibly,  RiO),  the  initial  number  of  luminescent  cells,  is  negligible,  i.e. 
P(0)  =  0.  In  this  case 


(7) 


AC-^  \ 

The  graph  of  this  function  appears  much  like  Figure  1;  the  time  at  which  a 
maximum  is  reached  can  be  derived  (differentiate  and  equate  to  zero  and  solve): 


1 


AC-IJ. 


-In 


(8) 
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the  height  of  the  maximum,  Rittn)^  is  obtained  by  substitution: 


P(0)AC 


.-!L- 

AcJ 


AC  j 


This  formula  must  be  studied  numerically. 


(9) 


Second  Model  (Stochastic/Probabilistic  Version  of  First).  Towards  Likelihood 
Estimation  of  Parameters. 

Suppose  we  think  of  the  setup  as  that  of  a  number  of  PMA  "particles" 
wandering  in  the  cell  vicinity  at  random.  Assume  that  a  particle  avoids  a  cell  for 
time  t  with  probability  e'^.  If  it  avoids  all  cells  independently  then  the 
probability  that  it  does  not  hit  any  of  the  C  cells  in  time  t  is  If  P(0)  is  the 

initial  number  of  PMA  particles  introduced,  then,  assuming  independence  of 
particles,  the  mean  number  of  still-wandering  particles  at  time  t  —  those  that 
have  not  yet  hit  cells  and  initiated  radical  production  —  is  which  is 

exactly  P(f)  as  calculated  in  (3). 

The  probability  density  of  time  for  a  PMA  collision  with  a  (some)  cell  is 


fit)  =  e'^^AC. 

Suppose  that  a  cell  remains  bioluminescent  after  collision  for  time  y  with 
probability  rP-V.  Consequently  the  probability  that  a  single  cell  is  bioluminescent 
at  any  time  t  is  the  probability  that  (a)  the  cell  is  hit  by  a  PMA  particle  at  jc  <  f,  and 
(b)  remains  bioluminescent  for  time  t  -  x;  the  latter  must  be  "added  up"  or 
integrated  over  0<x<tl 


Prob(Cell  is  biolumin.  at  time  0  =  =  ^^ACe 


AC-ji^ 


_e-AC( 
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The  mean  number  of  bioluminescent  cells  is  thus 


Mean  Number  Bioluminescent  Cells  at  t  ^  ie' 


Vi 


which  is  precisely  what  was  obtained  earlier  in  (7).  A  bonus  is  that  according  to 
this  model  (which  may  correspond  only  approximately  to  reality)  the  actual 
random  number  of  bioluminescent  cells  at  t,  Rit),  is  binomially  distributed  with 
trial  number  P(0)  and  probability  of  bioluminescence 


This  allows  one  to  write  down  formal  statistical  estimation  equations  (maximum 
likelihood  method)  for  parameters  such  as  X,  fi  and  perhaps  an  "effective  P(0)". 
Having  these  would  provide  a  quantitative  dose-response  relationship. 


Design  of  Observations. 

It  appears  likely  that  observations  or  measurements  of  bioluminescence  will 
be  made  at  discrete  intervals  of  time,  possibly  at 

h<h<t3<  •••  <  tk 

at  which  times  we  would  get  corresponding  measured  values  r{t-[)  =  r\,  r(t2)  =  ^2, 
...etc.  Question:  if  the  times,  f/,  can  be  chosen  deliberately,  what  should  their 
values  be?  Choice  of  certain  values  will  give  better  parameter  estimates  than 
others;  in  other  words  we  seek  to  optimize  (minimize)  estimation  error  within 
the  constraints  of  time  and  experimental  cost.  This  can  be  studied  quantitatively, 
perhaps  by  simulation,  in  advance  of  actual  simulation,  provided  rough  values 
for  the  parameters  are  available.  This  problem  and  opportunity  will  be  discussed 
in  more  detail  later. 
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IN  THE  PRESENCE  OF  REPAIR 


1.  Problem  Fomiiilaftion 

Suppose  an  idealized  organ,  e.g.  liver,  is  made  up  of  originally  homogeneous 
elements,  i.e.  cells,  that  cycle  (die  and  replicate)  under  the  (temporary)  constraint 
that  their  number  is  fixed.  Let  them  initially  and  thereafter  be  subject  to  a  dose 
stimulus  that  tends  to  change  normal  cells  to  adduct-ridden  cells;  thus  at  time  t 
there  are  Do(t)  normal  cells  and  Da(t)  adducted  cells,  and  Do(t)  +  Dait)  =  C,  C 
being  (temporarily)  a  constant,  e.g.  ~10^^  as  in  human  liver. 

Assume  that  the  presence  of  the  adducted  cells  induces  the  normal  cells  to 
supply  a  repair  agent  that  effectively  produces  an  enhanced  adducted  cell  death 
rate,  i.e.  stimulates  apoptosis.  This  is  the  first  step  in  population  repair.  What  is 
the  behavior  of  the  resulting  population  of  adducted  —  and  hence  also  normal  — 
cells? 

An  extremely  simple  —  probably  vastly  over-simplified  —  mathematical 
model  is  suggested,  followed  by  some  comments  and  alternatives.  There  are 
many!  Suggestions  are  solicited. 

2.  Specific  Mathematical  Details;  Model  I 

Let  Rit)  denote  the  concentration  of  the  specific  adducted  cell  enemy.  Perhaps 
it  is  alkyl-DNA  transferase  "capable  of  removing  ethyl  groups  from  the  Op¬ 
position  of  guanine  . . .",  see  Burkhart  and  Mailing  (1993). 
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Model  I 


Now  assume  that  the  net  rate  of  change  of  Rit)  is 


^  =  pD„(t)-W) 


(2.1) 


We  call  the  model  incorporating  (2.1)  Model  L  The  first,  leftmost,  term  on  the 
right-hand  side  (tries  to)  represents  the  rate  of  production  of  repair-inducing 
agent;  the  second  represents  the  rate  at  which  that  material  leaves  the  organ 
either  by  metabolism  or  binding  or  transfer  out  by  blood  circulation  or  . . .? 

•  Comments  on  (2.1):  There  Are  Many  Alternatives 

Mathematical  modeling  forces  specificity,  which  stimulates  experimentation 
to  determine  more  likely  alternative  mechanisms.  The  above  equation  is 
therefore  tentative,  being  precisely  wrong  but  perhaps  usefully  adequate. 

(a)  The  repair  material  creation  term,  pDait),  may  be  plausible  if  it  is  viewed 
as  a  pure  source  term  from  adducted  cells.  But  what  if  adducted  cells  are 
signalling  non-adducted  =  normal  cells  to  produce  the  material?  This  may  well 
be  more  biologically  plausible  and  interesting,  and  its  consequences  will  be 
investigated  subsequently.  Then  a  more  appropriate  term  might  be  pDait)Doit)  = 
pDfl(0[C  -  Dfl(f)];  the  second  term  of  the  new  expression  (qualitatively)  represents 
the  possibility  that  if  Da(t)  gets  large,  there  are  no  recipients  for  a  signal,  and  the 
adduct  repair  or  healing  agent  cannot  be  produced.  Presumably  the  organ  is  now 
in  a  more  endangered  state.  We  call  the  model  that  replaces  pDait)  by  pDa{t)[C  - 
Dfl(f)]  Model  n,  and  take  it  up  later. 

(b)  If  adducts  occur  in  a  clonal,  packed,  condition  then  there  is  the 
possibility  that  linear  dependence  of  Da(f)  in  repair  creation  or  signaling  is 
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inappropriate:  a  better  model  might  be  D^(t),  where  p  <1,  e.g.  1/2  or  1/3  to 
represent  something  like  surface-exposed  adducted  cells. 

The  second  equation  given  is  for  the  net  rate  of  adducted-cell  increase: 

at  ' — ^ - i  ' — . — '  ' — . — '  ' - V - '  .  (2.2) 

Recruitment  of  Adducted  Adducted  Adducted 
Normal  Cells  Cell  Birth  Cell  Death  Cell  Death, 

to  Adducted  Hence  "Repair" 

I  of  Tissue 

The  notations  below  the  terms  represent  the  —  hypothetical  —  effects  modeled 
by  each  term.  Again  there  are  more  alternatives. 

°  Comments  on  (2o2) 

(a)  One  can  allow  all  parameters  of  the  adduct-cell  changes  to  be  time,  in 
particular  organ  age,  dependent.  If  the  organ  grows  it  is  presumably  mostly  in 
the  context  of  a  young  host  animal  becoming  older.  This  is  inconsistent  with  a 
constant  C- value.  Growing  organs  will  be  modeled  later. 

(b)  Dimensional  or  morphological  considerations  may  well  force  changes. 
For  example,  the  repair  term,  SarRit)Dait),  might  be  better  replaced  by 
darR{t)D^{t),  and  ^Da(t)  by  AaDj(t)  in  Model  I  if  clonal  expansion  is  a  surface 
effect. 

(c)  In  neither  (2.1)  nor  (2.2)  has  any  attention  been  given  to  physical 
dimensions,  except  implicitly.  Rit)  and  Da(t)  are,  of  course,  not  of  the  same 
dimension. 


3.  Implications 

Equations  (2.1)  and  (2.2)  are  first-order  non-linear  differential  equations  that 
can  be  solved  numerically.  Such  solutions  are  the  only  recourse  if  parameters  are 
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time  dependent.  However,  some  information  can  be  obtained  in  parametric  form 
under  certain  conditions. 

•  Chronic  Exposure  and  Steady  State 

Suppose  a  toxic  agent  that  affects  Aoa,  the  exogenously  generated  toxic 
exposure,  is  constant  over  time.  This  should  mean  that  Xoa  is  also  constant. 
Assume  also  that  all  other  rates  are  constant.  Then  when  time  grows  long  (f 
a  steady-state  fraction  of  adducted  cells  will  result,  as  will  a  steady-state 
concentration  of  adduct  repair  agent;  let  these  be  the  constants  Dg  and  R 
respectively.  They  satisfy  the  equations  (2.1)'  and  (2.2)' respectively,  which  are 
(2.1)  and  (2.2)  with  left-hand  side  derivatives  equated  to  zero.  Immediately  R  = 
(p/5r)Da  from  (2.1)'  and  Da  now  satisfies  the  quadratic  equation 

0  =  X„lC-  Dj+  -  Sar{p/SrPl  (3.1) 

This  can  be  explicitly  solved  to  give  a  formula  for  the  solution.  Da,  involving  all 
parameters:  solve  (3.1)  to  obtain  the  formula 


Insight  is  furnished  by  a  graphical  display.  Write  (3.1)  as  "birth-vs-death": 

[A»(C  -  D„)]  +  =  Sfi,  +  d„(plS,)Dl  (3.2a) 

or,  for  convenience, 

UDa)  ^a,  Aflfl)  =  R{Da,  Sa,  Sar,  p,  ^)  (3.2b) 
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and  plot  the  left-hand  side  and  the  right-hand  side,  vs.  Da,  mindful  of  the 
constraint  0  <  <  C.  It  is  helpful  to  plot  the  bracketed  term,  [■],  of  (3.2a)  and  the 

term  +  A,aaDa  separately  to  see  that  the  net  effect  of  increasing  either/both  Aoa, 
adducted  cell  recruitment  rate,  and  Aga,  adducted  cell  reproduction  or  growth 
rate  increases  the  LfD^;  •)  at  every  D^-value.  The  solution  is  at  UP^  =  RiDa*), 
where  the  curves  cross;  otherwise  the  solution  is  at  C. 


Figure  1 


Since  Ra(Da)  is  an  upward  rising  parabola  there  will  always  be  a  solution  value. 
Da*  inside  [0,  C]  unless  R(C)  <  L(C),  in  which  case  the  organ  is  eventually  taken 
over  by  adducts,  or  Da*  =  C.  (Before  this  occurs  other,  more  serious,  events 
presumably  befall  the  animal  host,  but  these  are  not  modeled.)  A  brief  check 
easily  shows  that  Da*  is,  ceteris  paribus,  monotonically  increasing  in  Xoa  and  Xaa 
separately.  On  the  other  hand,  the  parabolic  R(Da)  shifts  higher  if  Sa,  the 
background  adducted  cell  death  rate,  and  Sar,  the  induced  extra  death  rate,  both 
increase;  either  change  monotonically  decreases  D/.  The  above  effect  also  results 
if  p,  the  rate  of  production  of  repair  agent,  increases;  D/  also  increases  with 
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reduction  of  5r,  the  rate  at  which  the  effect  of  Rit)  diminishes.  All  effects  are 
intuitive,  taken  individually. 

Note,  however,  that  the  effect  of  a  chemical  toxin  in  the  vicinity  of  the  organ's 
cells,  not  explicitly  modeled,  may  simultaneously  affect  the  various  parameters 
above  in  quite  different  ways,  possibly  increasing  Aaa  and  as  might  be 
expected,  but  also  possibly  increasing  the  creation  of  p,  say,  more  than  enough  to 
compensate  so  that  the  adducted-cell  response  to  toxin,  T,  D/(T),  could  actually 
reduce  for  some  range  of  T;  possibly  ultimately  the  cell  repair  capacity  would  fail 
and  the  toxin  would  win  out.  The  above  mechanism  could  easily  create  a  non¬ 
monotonic  dose-response  function,  or  give  rise  to  the  elusive  "hormesis" 
phenomenon.  The  same  effect  might  be  achieved  by  reducing  Sr,  the  rate  of 
repair  agent  removal.  This  suggests  the  importance  of  understanding  the  effect  of 
a  chemical  toxin  on  all  components  of  the  system  governing  mechanism. 


4.  Approximate  Time-Dependent  Behavior  of  Model  I 

It  is  of  interest  to  trace  the  development  of  the  adducted  cell  population  as  it 

evolves  in  time.  This  can  be  done  explicitly  if  the  quasi-stationary  assumption  for 

R{t)  is  satisfied,  i.e.  if  owing  to  relatively  fast  interaction  of  R  with  Da  we  can 

assume  in  (2.1)  that  ^^^  =  0.  Such  an  approach  has  been  studied  carefully  by 

dt 

Segel  and  Slemrod  (1989);  it  is  used  to  justify  the  familiar  Michaelis-Menton 
formula  automatically  invoked  in  much  pharmacokinetic  work.  If  this 
assumption  is  assumed  vahd  we  obtain  the  single  non-linear  differential  equation 
for  Dfl(t): 

^ = i„[c  -  D„(t)] + xMt]  -  sMt)  -  sMs,)Di{t).  (4.1) 

The  above  is  recognized  as  a  non-homogeneous  Riccati  equation  that  admits  an 
explicit  solution;  the  latter  will  approach  the  previous  long-run  solution. 
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provided  one  is  mindful  of  the  constraint  0  <  Bait)  <  C.  Note  that  this  constraint 
must  be  imposed  because  of  the  possible  influence  of  the  positive  term  AaaB{t) 
representing  an  adducted  cell  proliferation  process  that  does  not  recognize  the 
inhibiting  presence  of  the  organ  boundary  at  C,  This  can  be  made  gentle  or  soft, 
or  natural,  but  in  fact  the  model  is  not  expected  to  apply  when  Bait)  is  anywhere 
near  organ  size:  long  before  that,  the  host  would  be  dead. 


®  A  Toxic  Bolus  Dosages  Am  Initial  Number  of  Adducted  Cells  Given 

It  is  common  to  ask  for  the  effect  of  an  initial  chemical  bolus  dosage,  or  single 
acute  exposure.  This  outcome  can  be  modeled  by  removing  the  first  right-hand 
side  source  term,  XoaiC-Bait)]  of  (4.1)  and  solving  the  resulting  homogeneous 
equation,  where  the  initial  condition  Ba(0)  is  the  adduct  load  delivered  by  the 
initial  dose.  An  explicit  if  complicated-appearing  solution  appears: 


(4.2) 


1  +  ’ 

See  Appendix  1  for  the  explicit  forms  of  Ki  and  K2.  Note  that  in  the  present 
situation  the  behavior  of  Bait)  can  either  increase  initially  and  later  drop  to  zero, 
with  a  single  maximum  at  tm  (tm  >  0),  where 


~  i^aa  ~  ^t^l^ar  {Pl^r) 

if  >  4/  or  otherwise  drop  monotonically  to  zero  if  ^aa  -  5a  -  6ar(p/Sr)Da(0)  <  0. 
If  desired  a  single  overall  summary  measure  of  adduct  cell  presence,  a  response  to 
the  chemical  dosage  giving  0^(0),  can  be  obtained  by  integrating  (4.1),  which  can 
be  carried  out  explicitly.  This  step  is  omitted  for  the  moment. 
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5.  Model  II;  An  Alternative  That  Emphasizes  Signaling 

It  was  noted  earlier  that  a  quite  plausible  alternative  model  for  induction  of 
repair  agent  R  appears  as  attempts  to  model  the  effect  of  cell  signalling 

=  X^{C-  DM)  +  -  daDM-SMSr)DMC-DM)-  (5.1) 

See  Section _ ,  paragraph _ for  discussion. 

•  Steady-State:  Possible  Bistability 

Here  let  Da  be  the  long-run  mean  number  of  adducted  cells  as  before  and 

'  ^oa '  ^aa  )  ~  ^oa  ~  ^cufla  ~  ^aPa  ^ar  {Pl^r  )^a  ~  ) 

(5.2) 

=  R{Da;Sa,Sar,p,dr'C] 

while  the  graph  of  L(Dfl)  remains  the  same  that  for  R(Da)  changes; 


Figure  2 

Now  if  Dfl(0  <  Da*(l)  then  LiDait))  >  RiDait))  and  the  derivative  >  0, 

*  dDnit) 

moving  0^(0  towards  Da  (1).  If  (1)  <  Da(t)  <  Da  (2),  — ^ —  <  0,  reducing  the 

value  of  Dait)  towards  Da*(l).  Hence  Da*il)  is  a  stable  equilibrium  point,  reached 
eventually  by  any  value  of  {0^(0,  t  >  0}  that  starts  with  Da(t)  <  Da*(2).  On  the 
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other  hand,  if  D/(2)  <  Dad)  <  C  the  derivative  is  always  positive  and  the 

process  gravitates  towards  =  C  in  the  long  run.  There  are  thus  two  stable 
points  or  long-run  (mean)  numbers  of  adducted  cells;  a  (comparatively)  low 
value,  here  Dail),  and  a  high  value,  D/  =  C.  In  the  present  simple  theory  either 
one  can  be  reached,  depending  upon  the  initial  conditions  (and  the  parameter 
values  in  effect):  0  <  Dg(0)  <  DaX2)  guarantees  that  D^ff)  D/(l),  wandering 
throughout  [0,  Da{2)]  while  doing  so;  Da\2)  <  DgiO)  <  C  assumes  that  the  number 
of  adducted  cell  eventually  takes  over  the  organ.  This  may  perhaps  imply  the 
existence  of  a  threshold  with  respect  to  response  to  DgfO),  itself  the  effect  of 
dosage:  ceteris  paribus  "small"  toxin  doses,  i.e.  giving  D^fO)  <  Da(2),  will  induce 
"small"  numbers  of  adducted  cells  that  remain  low  in  number  and  of  little  threat, 
whereas  "large"  doses,  with  DaiO)  >  0/(2)  will  tend  to  foster  an  ever  increasing 
adducted  ceU  population  in  the  organ. 

6.  A  Stochastic  Version  of  the  Process 

There  are  many  chance-affected  elements  of  the  above  process  capable  of 
making  it  stochastic.  For  instance,  differences  between  replicate  animals  might  be 
represented  by  random  choice  of  all  governing  parameters  from  suitable  joint 
distributions,  or  dose  or  exposure  itself  may  vary  randomly  around  a  desired 
level.  But  a  natural  first  step  is  to  simply  suppose  that  the  elementary  processes 
such  as  adducted  cell  import,  birth,  and  death  are  "random"  in  the  sense  of  birth- 
and-death  processes;  cf.  Feller  (1968)  or  Karlin  and  Taylor  (1975). 

®  Naive  Conversion  to  Birth-Death 

Let  now  Dait)  be  the  random  state  variable  of  a  one-dimensional  Markov 
process,  and  define  the  generator  as  foUows,  using  the  terms  of  (5.1)  for  guidance. 
Given  Da(t),  write 
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P{Da{t  +  dt)  =  d  +  'Ipait)  =  d}  =  [Xoa{C  -d)  +  =  ^.^dt 


(6.1) 


and 

p{Da{t  +  dt)  =  d-  l|Da(f)  =  d]  =  [Sgd  +  Sar{p/Sr)d^{C-d)]dt  =  p^dt.  (6.2) 

The  fact  that  the  transition  functions  are  non-linear  means  that  the  expected 
value  of  the  random  quantity  Dait)  will  not  satisfy  the  deterministic  equation  (5.1) 
is  caused  by  the  inevitable  higher  moments.  One  trick  useful  for  "closing  off'  this 
annoying  "moment-creep"  is  to  replace  the  higher  moments  by  their  equivalent 
for  a  Gaussian  process;  see  Whittle  (1957),  and  Isham  (1991).  This  method  often 
works  well. 

Neglect  the  above  problem  for  simplicity  and  write  down  the  stationary 
distribution  of  the  Markov  chain  in  continuous  time  that  portrays  Da(t)  as 
specified.  There  clearly  is  such  a  stationary  distribution  (finite  state  space,  all 
states  communicate) 


7id=nQ 


P\-P2---Pd 


=  ^d-\  ■ 


Pd 
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Separate  variables  in  (4.1)  without  the  term  /lofl[C  -  Dait)] 


_ dDM _ 

^aa^ai^)  ~  ~  ^ar{p/^r  )^a  (0 


=  dt. 


Rewriting  (Al.l) 


or 


B 

5ar(pl5r)Da{t) 


=  dt 


A 

j 

I 

+  BDa(f) 

I - 

=  dt. 


(Al.l) 


(A1.2) 


(A1.3) 


Equating  coefficients  results  in  the  equations 

~  ^fl]  ~  ^ 

A[-dar{p/8,)]  +  B  =  0. 


Thus 


and 


(A1.4) 


B  = 


^arjp/^r) 

^aa~^a 


(A1.5) 


Therefore  from  (A1.2) 


^At) 

DAt) 


DA‘)- 


S.r{p/Sr) 


—  {^aa 


(A1.6) 
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and  integration  yields 


In  —In  ^aj^rl^arP  _/;  V 

'^[d,(0)J''^  DM-i^aa-mi^arP 


(A1.7) 


In  _ ^ai^) _  _ip  _ _  —(2  —§  )f 

[Da{t)-{^aa-Sa)Sr/Sarp\  [Da{0)-{^aa-Sa)5r/darp\  ^ 


where 


p.(0 

\j^a (0 ~  {^aa  ~^a)^r /^ar P] 


=  JCie'‘ 


(A1.8) 


Pa(0) 

[Pfl(O)  ~  i^aa  ~  ^fl)^r/^flrP] 


(A1.9) 


Consequently, 


^  ~  ^aa  • 


Dg(f)—  p^{t)  (A.^  Sfi)dj,/diirP\Kie 


(Al.lO) 


Da{t)  = 


l-i^aa-^a)Sr/SarP]Kie’' 


1-Kie* 


_  \{^aa  ^fl)^r/^flrP]^l^ 


K^e^-l 


(Al.ll) 


(A1.12) 


if  C  >  Da(t)  >  0,  which  is  of  the  anticipated  logistic  form.  We  do  not  exhaustively 
examine  all  possible  cases. 


Models  for  Adduct  Damage  Fixation  in  the  Presence  of  Repair 


8-12 


Solution  of  the  Homogeneoiis  Riccati  Equation 


Let 


dt 


—  ^aa^ai^)  ^a^ai^)  ^ar{p/^r)^a{^) 


given  Da(0)  >  0.  Separate: 


^^g(0 


=  dt. 


Employ  partial  fractions: 

- K(f) = -  s,yt. 


^aa  ^arip/^r)^a(^) 

Integrate  both  sides: 

^gg-^«-^gr(p/^r)Pg(0' 
~^a~  ^ar{p/^r 


Rearrange: 
In 


In 

Dait)] 

“In 

[Dam 

Da(‘) 

~  (‘^flg  ^a)^- 


^ar{p/^r)^ai^) 


dn 


DgiO) 


^aa  '^ar{Pl^r)^ai^) 


+  {^aa+^a)^ 


SO 


_ _ _  _ _ ^g(^) _ gi^aa'^^a )  ^ 

■^gg  ~^a~  ^ar  {Pl^r  )^a  (0  ■^gg  ~^a~  ^ar  {Pl^r  )^g  (^) 


(A2.1) 


(A2.2) 


(A2.3) 


(A2.4) 


(A2.5) 


(A2.6) 
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